Random Numbers

Ryan Biffard, biffard at enel dot ucalgary dut ca
April 19, 2003

Introduction:

Randomness is the inability of predicting the outcome of an event with any certainty. More specifically, for computers generating true random numbers involves generating a series of 1's and 0's where the next number in the series has equal probability of being a 1 or a 0, and there is no method for predicting which it will be. This paper will discuss methods for generating true and pseudo random numbers, and discuss methods for statistical analysis of numbers to determine randomness. Some specific methods for generating random numbers will also be discussed.

Randomness and Cryptography

Randomness is vital to securing a system. For example if all passwords were predictable, any attacker would be able to gain full access to any computer system. Secure communication over an insecure network requires that messages be delivered in such a way that only the intended audience can reconstruct the message. If there is no randomness in the encryption method any observer would be able to easily reconstruct the message. Harnessing a source of randomness to protect a system is the heart of all cryptographic systems.

Different types of Random numbers

Some situations call for reproducible results. For example a researcher doing simulations comparing different processes on the same set of random numbers. One method to generate such a list would be to use a true random number generator and store the results to be used repeatedly. A more practical method would be to use a pseudo random number generator. I won't go into detail about the methods for creating a pseudo random number generator. All that is important to know about pseudo random numbers is that they are generated by software algorithms, generating the next bit with equal probability of being a 1 or 0, and that it is difficult to predict the next bit without the knowledge of the algorithm used. Simply by entering the initial "seed" number, the same random series will be generated by the algorithm every time. For cryptographic purposes pseudo random numbers are not secure enough; someone could figure out the algorithm for generating the numbers and break the security. For example: you are communicating on two different encrypted channels, and you generated random keys for both channels using a pseudorandom number generator. If one of the parties you are communicating with knows your random number generating algorithm, it is possible for him to guess the key you are using on the other channel. Even though pseudo random numbers are not sufficient to be used for cryptography, there is still use for them which will be discussed later.

Randomness and Entropy:

In the world of science, randomness and entropy share a strong relationship. Entropy is a measure the disorder of a system. A system which displays a high degree of disorder (high entropy) can be considered almost or perfectly random. Since all logic programmed using a computer is deterministic, generating true random numbers involves harnessing an external source of entropy.

Sources of Entropy

Many different sources of entropy have been used to help generate random numbers. A perfect source of entropy can be found in radioactive decay; there is no way of predicting at what time the decay will occur. Less perfect sources of entropy can be found very close to a computer, and therefore make more appropriate, and more economical solutions.
  • Mouse Movements: It is virtually impossible to predict how the mouse will move around a users screen, and seems like a good way of generating random numbers. The disadvantages of using this system is that the user must be actively using the mouse for randomness to occur. If the program is run on a workstation were there are no users, this method of randomness is useless.
  • Key Strokes: Like mouse movements keystrokes are very difficult to predict. It seems impossible to determine when the next key stroke will occur, and what key will be pressed. This system suffers from the same faults as mouse movements: it requires user input and also key strokes are usually buffered by the operating system, and therefore the stroke time is more predictable.
  • Hard Drives: When a disk drive is asked to fetch information, the time required to get that information varies depending on the location it is stored on the physical disk. It seems like that time would be a good source of entropy, and for many applications it might be. But like keyboard strokes get buffered by the Operating System, Disk access also get buffered by the Disk drive, to maximize the speed. this buffering causes the randomness of the seek time to be reduced. When a disk drive is spinning, the drive motor tries to maintain a constant rotational velocity. Variations in resistance can cause minor fluctuations in the velocity of the drive, but more importantly small air turbulence is created on the surface of the drive, which causes drive speed variations. The nature of the disturbances is truly random, and causes totally random small velocity changes to the drive motor. Recently the drive rotation velocity has been monitored and used to create true random numbers at a very low cost.
  • Radio Noise: Using an antenna to pick up random radio noise is an easy way to capture entropy from the environment. Radio noise is highly localized so chances that 2 identical random number generators in the same room would generate the same random numbers is very unlikely, thus increasing security. Perhaps a radio signal could be introduced by an attacker to increase the probability of predicting the generated numbers, but would require extensive knowledge of the hardware. Radio noise is the source of entropy used to generate numbers for random.org random number generation service.
  • Lava Lamps: Some Engineers at Silicon Graphics developed a system to generate random numbers using the entropy of lava lamps. Several lava lamps are used with a cameras focused on each lamp. The images captured by the cameras are used to generate random numbers. This is probably the only random number generator that looks really cool as a side product of generating random numbers.
  • Combinations of sources: Recently people have been taking advantage of the random number sources available on the internet, and combining the output of multiple sources to provide another level of confidence in the randomness of the numbers. Perhaps this extra randomness is not required since each random source is truly as random as possible.
  • Verifying Randomness

    Statistical methods are used to verify that a series of numbers is truly random. These methods are used to test pseudo random and true random number generators to determine their effectiveness. Along with statistical methods, algorithms are applied to try to compress the random data, truly random data should not be representable in any fewer bits.

    Randomness Tests

  • Chi Squared Test: The chi squared test is a basic statistical test which measures the fit of a distribution of numbers to the uniform distribution. A random set of numbers should have a highly uniform distribution, showing that each number in the range has equal probability of being generated. The more uniformity in the distribution the higher the probability that the numbers are random. Generally a rank of 10-90% fit is good enough for a series of numbers to pass the test.
  • Entropy Test: The Entropy test measures the information density or compressibility of a series of numbers. A random set of numbers should have a high degree of entropy or low compressibility. The ent project implements this test, and presents its output as a number of bits per character, a score of 8 would indicate that the data is uncompressible. A true random number series will have a grade very close to 8; a nonrandom series of numbers, such as binary data from a text file, will have a lower grade indicating that it is highly compressible.
  • Arithmetic Mean: Simply calculate the average value of each 8bits as an unsigned integer. If a value is not near 127 the series is probably not random.
  • Pi Estimation: The ent test uses 24bit values from the series of numbers as random x,y coordinates. The probability that the coordinates will fall within a circle will approach pi. A data set of 1MB from random.org approaches the value of pi to within 0.08%.
  • Serial Correlation: This tests the predictability of the next value based on the previous values. Random numbers will have a serial correlation coefficient of close to 0, where as uncompressed voice, video or other signal will usually have a coefficient approaching 1.
  • Reverse arrangement test: In her final year paper at Trinity College Dublin, Louise Foley proposed this test be used to evaluate randomness. The test shows long term trends in data such as mean shift, which none of the previous test will reliably show.
  • Improving Randomness

    In almost all cases, the raw collection of random numbers can show bias to a particular value, it is required that there is no bias in the final output. Several techniques have been developed to overcome the natural bias in random number generation; here are some examples.
  • transition mapping: This method looks at incoming data 2 bits at a time, if the data is either 01 or 10 it is accepted 00 and 11 are rejected. This guarantees that each bit will occur with exactly 50% probability. This technique can reduce the amount of incoming data by 75% even if the data is not biased. It can also cause problems if the incoming data needs to be received at regular intervals. Since much of the data can be thrown away, a large buffer must be kept full to ensure that random bits are available when needed. Another disadvantage is that this method will limit the output range of the data; since there will never be an output with more than 2 sequential bits being equal, not all possible numbers will be available.
  • Mixing Functions: By combining the input from two different random sources, the randomness can be increased. A common method for mixing is a bitwise XOR operator. It is acceptable to mix the output of a pseudorandom number generator with the output of a true random number generator. For example, mixing a true random number generator that uses a transition mapping technique to guarantee a 50% probablity of 1's and 0's mixed with a high quality pseudo random number generator will increase the range of the output to the full 2^n bit range while still maintaining the true randomness of the numbers.
  • Secure Hash Algorithm: Lavarand uses the SHA to mangle the random data bits to further randomize the values. The SHA can be described as a complex Mixing Function. There are other hashing functions that can be used for data mixing such as the US governments Data Encryption Standard (DES).
  • Conclusion

    Randomness is a deep topic that is critical to secure network transactions, and other cryptographic applications. The amount of randomness required to secure data or a communication channel can be calculated; based on that calculation a method of random number generation can by determined. As computers get faster, and algorithms improve, the randomness required to secure a system will continually be increasing

    Helpful Resources

    Random.org - Introduction to Random Numbers
    Ent - Documentation for the pseudorandom number sequence test
    LavaRand - How Lavarand works
    Reverse Arrangement Test - description of the reverse arrangement test
    Ohio State Randomness Recommendations for Security - removing bias, and other good randomness