Random Number Generation

Many simulations depend on the computer's ability to generate many "random" numbers very quickly. It is most important to be able to do this for the uniform distribution, and generate real numbers between 0 and 1, in such a way that the probability the generated number is in an interval between a and b equals b - a. Theory also says that the sequence of numbers should be independent, namely each subsequent value in the sequence does not depend in any way on any of the previous values in the sequence.

This is an impossible computational task, but it can be approximated reasonably well. To satisfy the requirement that the computer is able to generate many "random" numbers quickly that come close to satisfying the desired theoretical properties, short algorithms which produce a sequence of pseudo-random numbers are often employed. These pseudo-random numbers in fact form a deterministic sequence, and the same list of numbers will be cycled over and over. However, with some care, this cycle can be made to be so long that the lack of true independence is unimportant. (A poor choice can cause big problems. We'll look at some poor examples too.)

Linear Congruential Uniform Generators

One of the most commonly used pseudo-random generators is creates a sequence of numbers by the algorithm

x_{n+1} = (a x_n + c) mod m

The mod refers to modulo integer division. this algorithm depends on four numbers, a, c, m, and x_0, called the seed. For computational speed, m is almost always chosen to be 2^p where p is the number of bits in an integer.

Topics in the classroom:

Representing integers in base 2.
Example with m = 8.

Theory for the length of the cycle of a generator

A mixed linear congruential generator will have period m if and only if

c is relatively prime to m
a mod p = 1 for every prime factor p of m
a mod 4 = 1 if 4 is a factor of m

A multiplicative generator (c=0) will have a maximum possible period of v where v is the smallest integer such that a^v mod m = 1. This will be achieved if

a mod 8 = 3 or 5
the seed is odd

Tests for pseudo-random number generators

Pseudo-random number generators should produce sequences of numbers which fall uniformly in the range from 0 to 1 and come close to passing various tests of independence. Here is a description of several tests for pseudo-random number generators.

Chi-square Goodness of Fit Test. Divide the interval into n equal bins and count the number of observations which fall into each bin over a long run. To test the hypothesis that the data comes from a uniform distribution, compute the statistic
```
X^2 = sum (observed - expected)^2 / expected
```
where the sum goes over all n bins. The expected count in each cell is (# of random numbers) / (# of bins). This statistic may be compared to a chi-square distribution with n - 1 degrees of freedom to test the uniformity of the generator.
Permutation Test. Generate a large number, n, groups of a small number, k, random numbers. For each of the n groups, record the sorted order of the group. For example, if the numbers were
```
0.01234 0.4536 0.3967 0.8345 0.3915
```
you would record the order
```
1 4 3 5 2
```
In this example, each of the 5! = 120 permutations should have the same probability of occuring. Test this with a chi-square goodness of fit test.
Sample Autocorrelation. Informally speaking, a stationary time series is a sequence of random variables with same mean, the same variance, and no cyclical trends. More formally, a stationary time series has the property that the joint distribution of any set of random numbers in the sequence remains the same if the whole set is shifted in time by a fixed amount. For example, the joint distribution of X_2, X_3, and X_5 would be the same as the joint distribution of X_4, X_5, and X_7 or as X_12, X_13, and X_15. The autocovariance function is a function of the lag. For example, the autocovariance function evaluated at a lag of 2 equals Cov(X_t, X_{t+2}) for any t (in a stationary sequence). Dividing the autocovariance function by its value at lag 0 (simply the variance of X_t for any t) gives the autocorrelation function, acf. The acf equals 1 at lag 0, and is always between -1 and 1.
Because the correlation of a pair of independent random variables is 0, we can use the acf as a test of our pseudo-random number generator. The function acf will plot the autocorrelation function estimated from a sequence of numbers. The dotted lines represent the limits of a hypothesis test that the acf equals 0 at various lags with significance level of 0.05. (A few may extend beyond these lines by chance, but the proportion should be small, and no deviation, other than that at 0 lag, should extend greatly beyond the dotted lines.)
Testing in Higher Dimensional Space. Pairs of independent random numbers should fall evenly in the unit square, and triples of independent random numbers should fall evenly in the unit cube. (Higher dimensional analogues are also conceptually testable.) While a rigorous test can be done by dividing the cube into lots of little cubes and counting the number of hits in each, followed by a chi-square test for goodness of fit to the uniform distribution, simple graphical measures of randomness are informative and can demonstrate powerfully why a pseudo-random number generator might fail.
As an example, graph triples of pseudo-random numbers in three dimensional space. Using software which can rotate clouds of points in three dimensions, examine the output to see if the points appear to be uniformly scattered throughout the unit cube, or if there is some great regularity in where the points fall.

Last modified: March 31, 1997

Bret Larget, larget@mathcs.duq.edu