Random Number Generation
Many simulations depend on the computer's ability to generate
many "random" numbers very quickly.
It is most important to be able to do this for the uniform distribution,
and generate real numbers between 0 and 1,
in such a way that the probability the generated number is in an interval
between a and b equals b - a.
Theory also says that the sequence of numbers should be independent,
namely each subsequent value in the sequence does not depend
in any way on any of the previous values in the sequence.
This is an impossible computational task,
but it can be approximated reasonably well.
To satisfy the requirement that the computer is able to generate
many "random" numbers quickly that come close to satisfying
the desired theoretical properties,
short algorithms which produce a sequence of pseudo-random
numbers are often employed.
These pseudo-random numbers in fact form a deterministic sequence,
and the same list of numbers will be cycled over and over.
However,
with some care,
this cycle can be made to be so long that the lack of true independence
is unimportant.
(A poor choice can cause big problems.
We'll look at some poor examples too.)
Linear Congruential Uniform Generators
One of the most commonly used pseudo-random generators
is creates a sequence of numbers by the algorithm
x_{n+1} = (a x_n + c) mod m
The mod refers to modulo integer division.
this algorithm depends on four numbers,
a, c, m, and x_0, called the
seed.
For computational speed,
m is almost always chosen to be 2^p where p is the number of bits
in an integer.
Topics in the classroom:
- Representing integers in base 2.
- Example with m = 8.
Theory for the length of the cycle of a generator
A mixed linear congruential generator will have period m if and only if
- c is relatively prime to m
- a mod p = 1 for every prime factor p of m
- a mod 4 = 1 if 4 is a factor of m
A multiplicative generator (c=0) will have a maximum possible period
of v where v is the smallest integer such that a^v mod m = 1.
This will be achieved if
- a mod 8 = 3 or 5
- the seed is odd
Tests for pseudo-random number generators
Pseudo-random number generators should produce sequences of numbers
which fall uniformly in the range from 0 to 1 and come close to passing
various tests of independence.
Here is a description of several tests for pseudo-random number generators.
- Chi-square Goodness of Fit Test.
Divide the interval into n equal bins and count the number of observations
which fall into each bin over a long run.
To test the hypothesis that the data comes from a uniform distribution,
compute the statistic
X^2 = sum (observed - expected)^2 / expected
where the sum goes over all n bins.
The expected count in each cell is (# of random numbers) / (# of bins).
This statistic may be compared to a chi-square distribution
with n - 1 degrees of freedom
to test the uniformity of the generator.
- Permutation Test.
Generate a large number, n, groups of a small number, k, random numbers.
For each of the n groups, record the sorted order of the group.
For example, if the numbers were
0.01234 0.4536 0.3967 0.8345 0.3915
you would record the order
1 4 3 5 2
In this example,
each of the 5! = 120 permutations should have the same
probability of occuring.
Test this with a chi-square goodness of fit test.
- Sample Autocorrelation.
Informally speaking, a stationary time series
is a sequence of random variables with same mean, the same variance,
and no cyclical trends.
More formally, a stationary time series
has the property that the joint distribution of any set of random numbers
in the sequence remains the same if the whole set is shifted in time
by a fixed amount.
For example, the joint distribution of X_2, X_3, and X_5 would be the same
as the joint distribution of X_4, X_5, and X_7 or as X_12, X_13, and X_15.
The autocovariance function is a function of the lag.
For example, the autocovariance function evaluated at a lag of 2 equals
Cov(X_t, X_{t+2}) for any t (in a stationary sequence).
Dividing the autocovariance function by its value at lag 0
(simply the variance of X_t for any t) gives the autocorrelation
function, acf.
The acf equals 1 at lag 0, and is always between -1 and 1.
Because the correlation of a pair of independent random variables is 0,
we can use the acf as a test of our pseudo-random number generator.
The function acf will plot the autocorrelation function estimated
from a sequence of numbers.
The dotted lines represent the limits of a hypothesis test that the acf
equals 0 at various lags with significance level of 0.05.
(A few may extend beyond these lines by chance,
but the proportion should be small, and no deviation, other than that at
0 lag, should extend greatly beyond the dotted lines.)
- Testing in Higher Dimensional Space.
Pairs of independent random numbers should fall evenly in the unit square,
and triples of independent random numbers should fall evenly in the unit cube.
(Higher dimensional analogues are also conceptually testable.)
While a rigorous test can be done by dividing the cube into lots of little
cubes and counting the number of hits in each,
followed by a chi-square test for goodness of fit to the uniform distribution,
simple graphical measures of randomness are informative and can demonstrate
powerfully why a pseudo-random number generator might fail.
As an example, graph triples of pseudo-random numbers
in three dimensional space.
Using software which can rotate clouds of points in three dimensions,
examine the output to see if the points appear to be uniformly scattered
throughout the unit cube, or if there is some great regularity
in where the points fall.
Last modified: March 31, 1997
Bret Larget,
larget@mathcs.duq.edu