Sampling from a Random Variable

Description of the problem

A random variable is essentially a probability density function (PDF) $p\colon \Omega\to [0,\infty)$ (or sometimes just referred as a probability measure or as a (probability) distribution) defined on a measurable space. The notation for such a choice is $X\sim p$ if $X$ is a random variable taking values in a measurable space $\Omega$ with a PDF $p$. Today, I’ll keep things simple and work with random variables which take values in the set of real numbers. These are usually referred as continuous random variables.

Here is the problem I am going to tackle today: We would like to take a finite sample of numbers $x_1,\ldots,x_N$ in such a way that the number of samples $N_{a,b}$ in an interval $[a,b]$ is roughly $N$ times the probability of the random variable $X$ being in $[a,b]$. This number can be written as an integral \[ N_{a,b} \approx N \int_a^b p(x) dx \]

Uniform random sampling

Simulating random numbers, or random sampling from a finite collection of elements, on a computer is a tricky business. What I am going to do today assumes that I have a good uniform random sampling simulation.

The uniform PDF is a density function that looks the same at every point. So, I want to have a uniform random variable on the interval $[a,b]$ my PDF should be $p(x) = 0$ when $x$ is outside of the interval $[a,b]$, and $p(x)$ must be the constant $\frac{1}{b-a}$ when $x$ is in $[a,b]$. The choice of the constant is dictated by the fact that $p(x)$ should integrate to 1 on the whole space. For sampling simulation from a uniform random variable in $[0,1]$ I am going to use lisp’s random function:

(loop repeat 10 collect (random 1.0))

(0.24052 0.6231128 0.20052421 0.74887836 0.345209 0.107444644 0.18069339
 0.8515891 0.5105244 0.76619613)

Sampling from a specific random variable.

OK. Assuming we have a good uniform sampling simulation, how can we simulate sampling from a random variable $X\sim p$?

Now, each PDF $p(x)$ has a cumulative distribution function (CDF) defined via an integral \[ c(x) = \int_{-\infty}^x p(t)dt \] Since PDF function $p(t)$ is positive, the function defined by the integral is non-decreasing which means the CDF function $c(x)$ is invertible almost everywhere if $p(x)$ is nice enough. Moreover, if $x_1,\ldots,x_N$ is a sample from the uniform distribution on $[0,1]$, the numbers \[ c^{-1}(x_1),\ldots,c^{-1}(x_N) \] is a sample from $X\sim p$.

Approximating CDF

Now, let us assume we have the PDF $p(x)$ on a fixed finite interval $[a,b]$, and let us subdivide the interval into equal pieces of length $\epsilon>0$ \[ x_i = a + i\cdot \epsilon \text{ where } i=0,\ldots,\lfloor (b-a)/\epsilon \rfloor \] and consider the numbers $p(x_1),\ldots,p(x_N)$. We can approximate the CDF via a Riemann sum

\[ c(x) \approx \epsilon\sum_{i=0}^{\lfloor x/\epsilon \rfloor} p(x_i) \]

I can do that in lisp. First, I’ll need a reduce function which returns the partial applications.

(defun reduce-with-intermediates (fn xs &optional carry)
  (if (null xs)
      (nreverse carry)
      (let ((res (if (null carry)
                     (funcall fn (car xs))
                     (funcall fn (car xs) (car carry)))))
        (reduce-with-intermediates fn (cdr xs) (cons res carry)))))

REDUCE-WITH-INTERMEDIATES

Let us test. I am going to convert the sine function into PDF

(defparameter epsilon 0.0001)

(defparameter sine-cdf-xs 
   (loop for x from 0 to 1.0 by epsilon collect x))

(defparameter sine-cdf-ys
   (reduce-with-intermediates
           #'+
           (mapcar (lambda (x) (* epsilon (/ pi 2.0) (sin (* pi x)))) 
                   sine-cdf-xs)))

EPSILON
SINE-CDF-XS
SINE-CDF-YS

I did have to modify the sine function by a suitable constant to make this look like a CDF.

So, we get a sample of outputs from sine-cdf function. But, we don’t want to approximate sine-cdf, we want the inverse of it. Before we dive into it, let us first discuss interpolation.

Interpolation

Given a list of pairs $(x_1,y_1),\ldots,(x_N,y_N)$ if we want a function $f(x)$ that interpolates these points we would need the interval $i$ with $x\in [x_i,x_{i+1}]$ and then we would write \[ f(x) = \frac{x_{i+1}-x}{x_{i+1}-x_i} y_i + \frac{x-x_i}{x_{i+1}-x_i} y_{i+1} \] if we wanted to linearly interpolate. However, I am going to use a much simpler scheme: I’ll use the left-end-point \[ f(x) = y_i \]

Let us implement that:

(defun middle (a b)
   (let ((c (+ a b)))
      (if (evenp c)
          (/ c 2)
          (/ (1- c) 2))))

(defun place (x xs &optional (a 0) (b (length xs)))
   (let ((c (middle a b)))
      (cond ((<= (- b a) 1) a)
            ((>= x (elt xs c)) (place x xs c b))
            (t (place x xs a c)))))

(defun interpolate (xs ys)
   (lambda (x) (elt ys (place x xs))))

MIDDLE
PLACE
INTERPOLATE

Let us test on the sine function

(let* ((xs (sort (loop repeat 500 collect (random pi)) #'<))
       (ys (mapcar #'sin xs))
       (approx (interpolate xs ys))
       (zs (loop repeat 20 collect (random pi))))
  (format nil "~{~{~2,4f ~2,4f~%~}~}" (mapcar #'list (mapcar #'sin zs) (mapcar approx zs))))

.9527 .9526
.7193 .7226
.3221 .3220
.9819 .9814
.2509 .2636
.2576 .2636
.9685 .9691
.2978 .3008
.8657 .8691
.3628 .3625
.9984 .9982
.0427 .0367
.9509 .9503
.4417 .4583
.8723 .8688
.9999 1.0000
.7360 .7405
.3575 .3626
.1077 .1046
.8791 .8792

Interpolation of inverse functions

For inverse functions, we do something very similar: we find the interval number $i$ that satisfies $x\in [f(x_i),f(x_{i+1})]$ then let $f^{-1}(x)$ be an interpolation of $x_i$ and $x_{i+1}$: \[ f^{-1}(x) = \frac{f(x_{i+1})-x}{f(x_{i+1})-f(x_i)}x_i + \frac{x-f(x_i)}{f(x_{i+1})-f(x_i)}x_{i+1} \] However, in our case this would be the left-end-point: \[ f^{-1}(x) = x_i \]

In other words, in both cases, we interpolate the inverse function by reversing $(x_i,y_i)$’s to $(y_i,x_i)$’s.

Let us test on the sine function again. But remember that sine is invertible only on $[0,\pi/2]$:

(let* ((xs (sort (loop repeat 500 collect (random (/ pi 2))) #'<))
       (ys (mapcar #'sin xs))
       (approx (interpolate ys xs))
       (zs (sort (loop repeat 20 collect (random 1.0d0)) #'<)))
  (format nil "~{~{~2,4f ~2,4f~%~}~}" (mapcar #'list (mapcar #'asin zs) (mapcar approx zs))))

.0297 .0280
.0479 .0451
.0861 .0777
.1197 .1144
.1353 .1352
.2970 .2968
.3188 .3186
.3216 .3186
.4984 .4926
.7063 .7009
.7110 .7068
.8620 .8552
.8935 .8909
.9680 .9629
1.0176 1.0126
1.1661 1.1652
1.2155 1.2144
1.2319 1.2256
1.2635 1.2632
1.3829 1.3829

Approximating the inverse of CDF

Let us test the inversion on the sine-cdf:

(let* ((sine-cdf (interpolate sine-cdf-xs sine-cdf-ys))
       (sine-cdf-inverse (interpolate sine-cdf-ys sine-cdf-xs))
       (xs (sort (loop repeat 20 collect (random 1.0d0)) #'<))
       (ys (mapcar sine-cdf xs))
       (zs (mapcar #'list xs (mapcar sine-cdf-inverse ys))))
  (format nil "~{~{~2,4f ~2,4f~%~}~}" zs))

.0888 .0888
.0900 .0900
.1540 .1539
.2175 .2175
.2193 .2192
.2420 .2419
.2652 .2652
.4074 .4074
.4357 .4357
.5075 .5075
.5605 .5604
.6556 .6556
.6667 .6666
.6740 .6740
.7127 .7126
.7499 .7498
.7996 .7995
.8755 .8755
.9484 .9483
.9705 .9704

Random sampling from an arbitrary random variable

Given a PDF $f(x)$, first I would need the CDF $F(x)$ from $f(x)$. In the absence of the function, we would need a dense enough sample $(x_1,p_1),\ldots,(x_N,p_N)$ of points and associated probabilities to write an interpolation of the CDF using partial sums \[ y_i = \sum_{j=1}^i p_j \] to get the pairs $(x_1,y_1),\ldots,(x_N,y_N)$ and then an interpolation $F^{-1}(x)$ for the inverse function using the reverse pairs $(y_1,x_1),\ldots,(y_N,x_N)$. Then we take a good sample of uniform random numbers $u_1,\ldots,u_m$ and evaluate $F^{-1}(u_1),\ldots,F^{-1}(u_m)$.

For the running example we get

(let ((sine-cdf-inverse (interpolate sine-cdf-ys sine-cdf-xs)))
   (defun sample-sine-pdf (m)
       (loop repeat m collect (funcall sine-cdf-inverse (random 1.0)))))

(format nil "~{~2,4f~%~}" (sample-sine-pdf 10))

SAMPLE-SINE-PDF
.6104
.5235
.8408
.6770
.7273
.3837
.0471
.5568
.3339
.7586

Older Posts

[2025-02-23] Counting Matroids

[2025-02-12] Sampling from a Random Variable

[2025-01-29] Markov Numbers

[2024-12-24] Number of isomorphism classes of simple graph (continued)

[2024-12-22] Counting Isomorphism Classes of Graphs

[2024-11-25] Connected Components of Graphs

[2024-11-24] Counting connected components of a graph

[2024-11-18] Counting Isomorphism Classes of m-ary Trees

[2024-11-16] Number of Isomorphism Classes of Ternary Trees

[2024-11-12] Hosoya Index of Balanced Binary Trees

[2024-11-11] Hosoya Index of a Graph

[2024-10-29] The Clique Number of a Simple Graph

[2024-10-28] The Size of Maximally Independent Subsets in a Graph

[2023-11-03] Graph Algorithms in JGraphT with Common Lisp

[2023-10-28] An Implementation of Pandas’ cut and qcut in Lisp

[2023-07-24] A Collatz-like Conjecture for the Projective Line

[2023-03-06] Twin Primes, Cousin Primes, Sexy Primes, and Prime Triplets

[2023-03-02] Set of All Partitions of a Finite Set

[2023-02-14] Non-crossing Partitions and Dyck Words

[2023-02-13] Non-crossing Linear Chords

[2023-02-04] Clojure/Python Interop Examples

[2023-01-14] Graph Algorithms in Clojure with JGraphT

[2022-03-29] 2D-Random Walk

[2022-03-28] Trade Deficit vs Exchange Rate Curve

[2022-03-16] Working with World Bank Data in Python

[2022-03-09] Working with European Central Bank data in python (revisited)

[2022-01-24] A Clique Analysis of Quakers in early modern Britain (1500-1700)

[2021-12-05] Boyer–Moore and Misra-Gries Algorithms in Clojure

[2021-09-12] Tension in Text Plotted

[2021-09-02] Statistical Distributions using Apache Commons Math in Clojure

[2021-08-31] Reduce with Intermediate Results in Common Lisp

[2021-08-21] Multivariate Regression Implemented in Clojure

[2021-05-29] Using Neural Networks to Detect Graph Properties

[2021-04-17] Fast Null-Space Calculation via LU-Decomposition

[2021-02-24] Stoer-Wagner Algorithm in Clojure

[2021-02-19] Calculating Vertex Covers in Clojure

[2021-02-18] Listing All Paths in a Graph

[2021-02-14] Strict Dyck Words and Fibonacci Numbers

[2021-02-14] Kruskal’s Algorithm in Common Lisp

[2021-02-13] Kruskal's Algorithm Implemented in Clojure

[2021-02-10] An integer dynamical system of integers

[2021-02-08] Binary Symmetrization

[2021-01-28] Prüfer Encoding and Decoding of a Tree in Clojure

[2021-01-27] Counting Cycle-Free Paths in a Graph

[2021-01-27] Counting Connected Labeled Graphs

[2020-12-18] Counting Graphs with a Prescribed Degree Sequence

[2020-12-13] Havel–Hakimi Algorithm in Clojure

[2020-12-12] Havel–Hakimi Algorithm in Common-Lisp

[2020-10-23] The Quadratic Casimir Element

[2020-07-04] Collatz Sequence in Binary

[2020-07-02] A Lazy Sequence of Primes in Clojure

[2020-06-10] Yet Another Fizz-Buzz in Common Lisp

[2020-05-12] ECB Data with Clojure and Vega-Lite

[2020-05-06] Processing ECB Data with Common Lisp

[2020-04-17] Next Permutation in the Lexicographical Ordering

[2020-04-13] Turkish Hyphenation in Common Lisp

[2020-04-01] Using JavaPlex with Clojure

[2019-11-05] Constricted Arithmetic Progressions

[2019-11-03] The Number of Arithmetic Progressions of Integers

[2019-05-06] Bron-Kerbosch Algorithm in Clojure

[2019-05-01] An Implementation of Ford-Fulkerson Algorithm in Clojure

[2019-04-22] Document Summarization via Nonnegative Matrix Factorization

[2019-04-20] Latent Semantic Analysis in Clojure

[2019-04-13] K-Nearest Neighbors Algorithm in Clojure

[2019-04-06] K-Means Implemented in Clojure

[2019-03-19] Prüfer Encoding/Decoding of a Tree in Common Lisp

[2019-03-05] Gale-Shaply Algorithm in Common Lisp

[2019-03-02] Calculating The Correct Rank of a Matrix

[2018-12-04] Feed-forward and back-propagation in neural networks as left- and right-fold

[2018-10-31] Nonnegative Matrix Decomposition in Clojure

[2018-10-30] Non-negative Matrix Decomposition in Scala

[2018-08-30] Working with European Central Bank Data in Scala

[2018-07-30] Perverse Sequences

[2018-05-28] Online Perceptron in Common Lisp

[2018-05-26] Online Perceptron

[2018-05-18] Online Regression

[2018-05-06] Knut’s Algorithm-S in Common Lisp

[2018-02-28] Irreducible Dyck Words

[2018-02-19] Optimization with GNU Scientific Library for Lisp

[2018-02-10] Van Eck’s Sequence

[2018-02-09] Hiring networks in mathematics

[2018-02-08] Linus Sequence

[2018-02-05] Egyptian Fractions

[2018-02-01] Listing all Young Tableaux

[2018-01-23] Collatz sequence (yet again)

[2018-01-15] Hofstadter's Q sequence

[2018-01-09] Farey Sequence

[2018-01-09] Catalan's Triangle

[2018-01-06] The Shoelace Formula for the Area of a Polygon

[2017-10-01] Working with European Central Bank Data in Python

[2017-09-27] Expected Value of the Diameter of a Tree

[2017-09-26] Using Quandl with kixi.stats on Clojure

[2017-09-22] Using Quandl with Common Lisp

[2017-08-05] Solving Linear Equations in Natural Numbers

[2017-07-31] Transitive Closure of a Directed Graph or a Relation

[2017-07-20] Steenrod-Milnor and Tournament Sequences

[2017-07-15] A lower bound on the radius of a graph

[2017-07-08] All partitions of an integer

[2017-07-06] Some Hasse Diagrams

[2017-07-04] Shuffles

[2017-07-03] Kaprekar Sequence

[2017-07-01] Lattice of Dyck Words

[2017-06-28] The poset of connected subgraphs of a connected graph

[2017-06-21] Calculating the Diameter and the Radius of a Graph Using Tropic Linear Algebra

[2017-06-19] Generating random regular graphs

[2017-06-14] Estimating the maximum element of a large list

[2017-06-09] A Stochastic Gradient Descent Implementation in Clojure

[2017-06-06] A topology problem

[2017-04-22] Listing duplicate files

[2017-03-14] My First Idris Proof

[2016-12-02] Distinguishing hash functions (part II)

[2016-12-01] Distinguishing hash functions

[2016-10-20] Hofstadter-Conway $10,000 sequence

[2016-08-18] A Solution for Problem 171 of 4Clojure

[2016-08-16] Puzzles and Group Theory

[2016-08-13] Using Weka within Lisp

[2016-07-12] Funniest and Unfunniest Jokes in the Jester Dataset

[2016-07-05] Generating Uniformly Random Connected Graphs

[2016-06-16] The Robinson-Schensted Algorithm

[2016-06-01] Conjugate Partitions

[2016-04-27] Using Word2Vec from Clojure

[2016-04-24] Using Word2Vec from Common Lisp

[2016-04-18] A Migration Analysis

[2016-04-11] Basic Data Analysis with CL without Frameworks

[2016-03-25] Parallel map-reduce in Common Lisp

[2016-02-22] Text Summarization and Topic Analysis

[2016-01-27] Set Covering Problem

[2016-01-25] Kolmogorov-Smirnov Test

[2016-01-20] Eigen-values of the Laplacian and Connected Components of a Graph

[2015-12-12] Dual Graphs

[2015-10-26] Longest Increasing Subsequence Revisited

[2015-10-16] Document Summarization via Markov Chains

[2015-10-07] Computational Literary Analysis

[2015-09-30] Library of Babel in Common Lisp

[2015-09-28] Merging Association Lists in Common Lisp

[2015-07-22] Cheapest Paths via Tropic Matrices

[2015-07-21] Hidden Markov Models via Tropic Matrices

[2015-07-08] A non-technical post

[2015-06-28] An implementation of the Viterbi algorithm in Common Lisp

[2015-05-28] Greatest Common Divisor of Two Rational Numbers

[2015-05-21] Partitions of Equal Measure Whatever the Measure May Be

[2015-05-14] Finding Cliques in a Graph

[2015-05-12] Set Cover Problem

[2015-05-03] Threading Macros in Common Lisp

[2015-05-03] Happy Numbers

[2015-05-01] Collatz Primes

[2015-04-23] Splitting Streams

[2015-04-06] Hamming Distance and Double Hashing

[2015-04-05] Hamming Distance and Hashing Functions

[2015-04-05] Hamming Derivative of Hashing Functions

[2015-04-02] A Topology Problem

[2015-03-21] Curve Fitting is a Gram-Schmidt Reduction

[2015-03-08] Maximum number of characters using keystrokes A, Ctrl+A, Ctrl+C and Ctrl+V

[2015-03-06] Eccentricity, Radius and Diameter in a Graph, Revisited

[2015-03-01] Graphs and Entropy

[2015-02-22] Math PhD Hiring Network (Part 3)

[2015-02-19] Math PhD Hiring Network (Part 2)

[2015-02-18] Math PhD Hiring Network (Part 1)

[2015-02-17] Faculty Networks and Inequality in Hiring Practices in Universities

[2015-02-10] Functional Streams in Lisp Explained

[2015-02-05] Collatz-type Conjectures (Continued)

[2015-02-04] Collatz-type Conjectures (Continued)

[2015-01-31] Collatz-type Conjectures (Continued)

[2015-01-30] Collatz-type Conjectures

[2015-01-28] Experiments with Infinite Recursive Sequences (continued)

[2015-01-17] Experiments with Infinite Recursive Sequences

[2015-01-10] Goldbach Pairs

[2015-01-02] Collatz Lengths (Continued)

[2015-01-01] Functional Streams

[2014-12-27] Polarization in the US Congress

[2014-12-18] Partition a sequence

[2014-11-28] Uniformly Random Permutations

[2014-11-22] An Implementation of Ford-Fulkerson Algorithm in Common Lisp

[2014-11-17] Tropic Calculation of Cheapest Paths

[2014-11-05] Longest common subsequence of two sequences

[2014-10-30] Counting Spanning Trees of a Graph

[2014-10-26] Longest Increasing Subsequence

[2014-10-24] The Number of Inversions in a Sequence

[2014-10-22] Hashes and Entropy

[2014-10-09] Estimating Cardinality with Constant Memory Complexity

[2014-09-30] Landau's Function

[2014-09-29] A Problem on Substitution Ciphers and Group Theory

[2014-09-28] A Morse Code Translator

[2014-09-23] A Memoization Macro for Common Lisp

[2014-09-21] Reducers are Monoid Morphisms

[2014-09-18] Number of isomorphism classes of binary trees

[2014-09-07] CONS is your friend

[2014-08-22] A Zipf's Law Simulation

[2014-08-07] Generating Uniformly Random Trees

[2014-07-09] A Solution for Project Euler #463

[2014-06-12] Entropy of truncated MD5 hashing

[2014-06-08] Hexadecimal digits of π

[2014-02-11] Information content of n-grams

[2014-02-08] Turkish Sentiment Analysis Using Thesaurus Distance

[2014-02-01] Sentiment analysis using word distances

[2014-01-27] Phase transitions in entropy

[2013-12-13] Optimal length of n-grams

[2013-12-10] Counting strings that contain intervals of same letter repetitions

[2013-12-02] Patterns Separating Large Texts

[2013-11-23] Collatz Sequences (Continued)

[2013-11-11] Entropy and approximately one-to-one maps

[2013-10-23] Tree Isomorphism

[2013-10-15] Self Organizing Maps

[2013-09-15] Euler Project #401

[2013-09-15] An additively recursive definition of the Moebius function

[2013-09-11] An Unsuccessful Attempt for Solving Euler Project #401

[2013-09-04] Uniform Sampling from Parametrized Submanifolds in Scala

[2013-09-04] Uniform Sampling from Parametrized Submanifolds

[2013-08-30] Randomly Generated Points Obeying a Distribution

[2013-08-25] Simulated Annealing in Lisp

[2013-08-21] Eigenvalues and Eigenvectors in GSLL

[2013-08-16] Reservoir Sampling

[2013-08-11] Gibbs sampling in lisp compared with C

[2013-08-10] Logistic Regression in lisp

[2013-08-10] Linear Discriminant Analysis in R

[2013-07-17] A Gradient Descent Implementation in Lisp

[2013-07-01] k-Nearest Neighbor Classification Algorithm Implemented in Lisp

[2013-05-19] Newton-Raphson Method

[2013-05-07] Levenshtein Distance

[2013-04-15] Cut points in a graph

[2013-04-01] Experiments on Collatz Lengths (Continued)

[2013-02-18] The sound of the torsion parts of homotopy groups of spheres

[2013-02-12] Monadic Units

[2013-02-07] Distribution of Collatz Lengths (continued)

[2013-02-03] Distribution of Collatz Lengths

[2013-01-31] Quotients of polynomial algebras

[2013-01-12] Path ideals

[2013-01-10] McCarthy91 Terminates

[2013-01-09] Finding all paths in a directed graph

[2013-01-04] A Simple Monte-Carlo Integration Implementation in Lisp

[2012-12-30] A simple problem in Kolmogorov-Chaitin complexity

[2012-12-29] From walks to paths

[2012-12-16] Higher order functions, functors and monads

[2012-12-13] Eccentricity, Radius and Diameter in an Undirected Graph

[2012-11-29] Untitled

[2012-11-25] Strictly Increasing Labels of Directed Graphs

[2012-11-19] Strictly Increasing Labellings of Directed Graphs

[2012-11-17] Nilpotent elements in an artinian algebra

[2012-11-04] Local rings, idempotents and non-invertible elements

[2012-10-18] An implementation of the fixed-radius near neighbor clustering algorithm in lisp

[2012-10-15] Reducing directed graphs

[2012-10-10] An implementation of the k-means clustering algorithm in lisp

[2012-10-08] A comparison of different map functions in lisp

[2012-10-03] Source code entropy

[2012-09-28] Collisions in random walks

[2012-09-26] Transitive closure of a directed graph

[2012-09-26] Solving linear equations in ℕ

[2012-09-26] Listing partitions

[2012-09-26] Inverting formal power series

[2012-09-26] Hasse subgraph of a directed graph

The Kitchen Sink and Other Oddities