Random Numbers
Ryan Biffard, biffard at enel dot ucalgary dut ca
April 19, 2003
Introduction:
Randomness is the inability of predicting the outcome of an event with any certainty. More specifically, for computers generating true
random numbers involves generating a series of 1's and 0's where the next number in the series has equal probability of being a 1 or
a 0, and there is no method for predicting which it will be. This paper will discuss methods for generating true and pseudo random
numbers, and discuss methods for statistical analysis of numbers to determine randomness. Some specific methods for generating
random numbers will also be discussed.
Randomness and Cryptography
Randomness is vital to securing a system. For example if all passwords were predictable, any attacker would be able to gain full
access to any computer system. Secure communication over an insecure network requires that messages be delivered in such a way
that only the intended audience can reconstruct the message. If there is no randomness in the encryption method any observer
would be able to easily reconstruct the message. Harnessing a source of randomness to protect a system is the heart of all
cryptographic systems.
Different types of Random numbers
Some situations call for reproducible results. For example a researcher doing simulations comparing different processes on the
same set of random numbers. One method to generate such a list would be to use a true random number generator and store the results
to be used repeatedly. A more practical method would be to use a pseudo random number generator. I won't go into detail about the
methods for creating a pseudo random number generator. All that is important to know about pseudo random numbers is that they are
generated by software algorithms, generating the next bit with equal probability of being a 1 or 0, and that it is difficult to
predict the next bit without the knowledge of the algorithm used. Simply by entering the initial "seed" number, the same random
series will be generated by the algorithm every time. For cryptographic purposes pseudo random numbers are not secure enough;
someone could figure out the algorithm for generating the numbers and break the security. For example: you are communicating
on two different encrypted channels, and you generated random keys for both channels using a pseudorandom number generator. If
one of the parties you are communicating with knows your random number generating algorithm, it is possible for him to guess the
key you are using on the other channel. Even though pseudo random numbers are not sufficient to be used for cryptography, there
is still use for them which will be discussed later.
Randomness and Entropy:
In the world of science, randomness and entropy share a strong relationship. Entropy is a measure the disorder of a system.
A system which displays a high degree of disorder (high entropy) can be considered almost or perfectly random. Since all
logic programmed using a computer is deterministic, generating true random numbers involves harnessing an external source
of entropy.
Sources of Entropy
Many different sources of entropy have been used to help generate random numbers. A perfect source of entropy can be found in
radioactive decay; there is no way of predicting at what time the decay will occur. Less perfect sources of entropy can be found very
close to a computer, and therefore make more appropriate, and more economical solutions.
Mouse Movements: It is virtually impossible to predict how the mouse will move around a users screen, and seems like a good
way of generating random numbers. The disadvantages of using this system is that the user must be actively using the mouse for randomness
to occur. If the program is run on a workstation were there are no users, this method of randomness is useless.
Key Strokes: Like mouse movements keystrokes are very difficult to predict. It seems impossible to determine when
the next key stroke will occur, and what key will be pressed. This system suffers from the same faults as mouse movements:
it requires user input and also key strokes are usually buffered by the operating system, and therefore the stroke time is
more predictable.
Hard Drives: When a disk drive is asked to fetch information, the time required to get that information varies depending on the
location it is stored on the physical disk. It seems like that time would be a good source of entropy, and for many applications it might be.
But like keyboard strokes get buffered by the Operating System, Disk access also get buffered by the Disk drive, to maximize the speed.
this buffering causes the randomness of the seek time to be reduced. When a disk drive is spinning, the drive motor tries to maintain
a constant rotational velocity. Variations in resistance can cause minor fluctuations in the velocity of the drive, but more importantly
small air turbulence is created on the surface of the drive, which causes drive speed variations. The nature of the disturbances is truly
random, and causes totally random small velocity changes to the drive motor. Recently the drive rotation velocity has been monitored and
used to create true random numbers at a very low cost.
Radio Noise: Using an antenna to pick up random radio noise is an easy way to capture entropy from the environment.
Radio noise is highly localized so chances that 2 identical random number generators in the same room would generate the same
random numbers is very unlikely, thus increasing security. Perhaps a radio signal could be introduced by an attacker to increase
the probability of predicting the generated numbers, but would require extensive knowledge of the hardware. Radio noise is the
source of entropy used to generate numbers for random.org random number generation service.
Lava Lamps: Some Engineers at Silicon Graphics developed a system to generate random numbers using the entropy of lava lamps.
Several lava lamps are used with a cameras focused on each lamp. The images captured by the cameras are used to generate random numbers.
This is probably the only random number generator that looks really cool as a side product of generating random numbers.
Combinations of sources: Recently people have been taking advantage of the random number sources available on the internet,
and combining the output of multiple sources to provide another level of confidence in the randomness of the numbers. Perhaps this
extra randomness is not required since each random source is truly as random as possible.
Verifying Randomness
Statistical methods are used to verify that a series of numbers is truly random. These methods are used to test pseudo
random and true random number generators to determine their effectiveness. Along with statistical methods, algorithms
are applied to try to compress the random data, truly random data should not be representable in any fewer bits.
Randomness Tests
Chi Squared Test: The chi squared test is a basic statistical test which measures the fit of a distribution
of numbers to the uniform distribution. A random set of numbers should have a highly uniform distribution, showing that
each number in the range has equal probability of being generated. The more uniformity in the distribution the higher
the probability that the numbers are random. Generally a rank of 10-90% fit is good enough for a series of numbers to
pass the test.
Entropy Test: The Entropy test measures the information density or compressibility of a series of numbers.
A random set of numbers should have a high degree of entropy or low compressibility. The ent project implements this test,
and presents its output as a number of bits per character, a score of 8 would indicate that the data is uncompressible.
A true random number series will have a grade very close to 8; a nonrandom series of numbers, such as binary data from a
text file, will have a lower grade indicating that it is highly compressible.
Arithmetic Mean: Simply calculate the average value of each 8bits as an unsigned integer. If a value is not near 127 the
series is probably not random.
Pi Estimation: The ent test uses 24bit values from the series of numbers as random x,y coordinates.
The probability that the coordinates will fall within a circle will approach pi. A data set of 1MB from random.org approaches the value of pi
to within 0.08%.
Serial Correlation: This tests the predictability of the next value based on the previous values. Random numbers will have a
serial correlation coefficient of close to 0, where as uncompressed voice, video or other signal will usually have a coefficient approaching
1.
Reverse arrangement test: In her final year paper at Trinity College Dublin, Louise Foley proposed this test be used
to evaluate randomness. The test shows long term trends in data such as mean shift, which none of the previous test will reliably
show.
Improving Randomness
In almost all cases, the raw collection of random numbers can show bias to a particular value, it is required that there is no bias
in the final output. Several techniques have been developed to overcome the natural bias in random number generation; here are some
examples.
transition mapping: This method looks at incoming data 2 bits at a time, if the data is either 01 or 10 it is accepted
00 and 11 are rejected. This guarantees that each bit will occur with exactly 50% probability. This technique can reduce the amount
of incoming data by 75% even if the data is not biased. It can also cause problems if the incoming data needs to be received at
regular intervals. Since much of the data can be thrown away, a large buffer must be kept full to ensure that random bits are
available when needed. Another disadvantage is that this method will limit the output range of the data; since there will never
be an output with more than 2 sequential bits being equal, not all possible numbers will be available.
Mixing Functions: By combining the input from two different random sources, the randomness can be increased. A
common method for mixing is a bitwise XOR operator. It is acceptable to mix the output of a pseudorandom number generator with
the output of a true random number generator. For example, mixing a true random number generator that uses
a transition mapping technique to guarantee a 50% probablity of 1's and 0's mixed with a high quality pseudo random number generator
will increase the range of the output to the full 2^n bit range while still maintaining the true randomness of the numbers.
Secure Hash Algorithm: Lavarand uses the SHA to mangle the random data bits to further randomize the values. The SHA can
be described as a complex Mixing Function. There are other hashing functions that can be used for data mixing such as the
US governments Data Encryption Standard (DES).
Conclusion
Randomness is a deep topic that is critical to secure network transactions, and other cryptographic applications. The amount of
randomness required to secure data or a communication channel can be calculated; based on that calculation a method of random number
generation can by determined. As computers get faster, and algorithms improve, the randomness required to secure a system will
continually be increasing
Helpful Resources
Random.org - Introduction to Random Numbers
Ent - Documentation for the pseudorandom number sequence test
LavaRand - How Lavarand works
Reverse Arrangement Test - description of the reverse arrangement test
Ohio State Randomness Recommendations for Security - removing bias, and other good randomness