Index of /examples/random_numbers

Name	Last modified	Size

Parent Directory		-
xoroshiro128_plus/	2016-09-30 10:55	-

Random Number Generation Best Practices

Introduction

The generation of random numbers in computers is a well-researched area of computer science. In the context of science and engineering, the use of random numbers in simulations and other programs typically involves the use of pseudo-random numbers. Pseudo-random numbers are sequences of numbers that approximate truly random numbers based on a repeated computations starting from an initial state. The purpose of these examples is to present an example pseudo-random number generator (RNG) that is suitable for any purpose except for very large scale programs involving hundreds or thousands of CPUs.

A poor choice of RNG algorithm could lead to results of questionable quality!

Click here to jump to recommended RNGs for science and engineering applications.

Some relevant publications

Here are some journal articles relevant to this subject for motivation. The full text is available at the links:

What to look for in an RNG

Here are some rules of thumb for choosing an RNG for your project, from the viewpoint of science and engineering software. A good RNG algorithm should have at least the following characteristics:

High performance - codes such as Monte Carlo require a very large quantity of random numbers. Speed counts. Cryptographic quality RNGs are typically too slow for use in these applications.
Long period - All computational RNGs have a set period after which the sequence of numbers will restart. The period should be long enough that this is never a concern while your program is running. A minimum period of 2⁶⁴ (~1x10¹⁹) is recommended.
Sufficient bits - if your application requires random double precision (64-bit) floating point numbers don't use an RNG that produces 32-bits of randomness. To keep things simple, always use an RNG that produces 64-bits unless a specific problem (e.g. software running on 32-bit low speed embedded processors) demands otherwise.
Sufficiently random(!) - An RNG should be backed by publications in major journals, perhaps a pedigree based on previous high-quality algorithms, and be documented to have passed standard and accepted random number test suites such as TestU01 or PractRand.
Implementation quality - The RNG should be implemented by someone skilled in the language being used. While most RNG algorithms do not involve many lines of codes the details matter.

What to AVOID in an RNG

Again, this list is oriented towards science and engineering software. The types of RNGs listed here are adequate for many other purposes.

Standard language RNGs - These should be avoided unless you have checked how they are implemented (see below). These typically fail statistical tests for randomness by a wide margin. Examples include:
- The random(), rand(), and drand48() functions in the standard C library
- The RAND and RANDOM_NUMBER functions in Fortran. RANDOM_NUMBER may be usable but its implementation depends on the compiler vendor. This means your program will not necessarily produce the same output from the same seed when compiled with gfortran, ifort, or pgf90 on the SCC!
- Perl's rand function
- Matlab's rand function before the 2007a release.
- The java.util.Random class in Java
- Excel's random function
- The MultiCarry RNG in R
- And many more...
The one your group has always used - Be skeptical. Don't assume the RNG you inherited from a previous student is worthwhile.
Your own implementation - With the quantity of excellent, expertly written RNS codes available there is is little need to write your own. If your choice of algorithm is poor or a subtle error is made it can be hard to detect. Finally, many high performance implementations are available that use the SIMD instructions available on modern CPUs.

Recommended RNGs for several languages

Here are a few recommended RNG algorithms.

Mersenne-Twister (MT) - A widely-used and well-known RNG algorithm. The period is colossally large at 2¹⁹⁹³⁷-1. There are implementations of this available for any language you are likely to work in. High-speed SIMD implementations are also available. Drawbacks: not the fastest RNG, requires a large amount of state (2504 bytes) to operate, difficult to work with effectively in parallel computations. If you want to use this, just do a search on this algorithm with the language of your choice
- Fortran: A Fortran 2003 interface to the SIMD-accelerated version of the MT is on Github. A Fortran 90 (non-SIMD) implementation is also available.
- C++: The C++11(2011 standard) has the std::mt19937 class.
- R: .Random now defaults to the Mersenne Twister. It can be set explicitly with the RNGkind("Mersenne-Twister") command.
- C: A high performance, SIMD-enabled implementation is available.
- Java: A popular implementation is available from George Mason Univ. professor Sean Luke.
- Matlab: The MT is the default RNG since their 2007 release. Use the comamnd RandStream.getGlobalStream to see its configuration.
- Python: This has used the MT as the default RNG since release 2.3.
- Perl: Try the Math::Random::MT module from CPAN.
xoroshiro128+ - A relatively new algorithm that is also very fast, very random, and appropriate for parallel computations. This is the latest in a series of algorithms based on the xorshift class of RNGs discovered by George Marsaglia. Sample implementations of this algorithm in C, C++, and Fortran 90 are provided in the directories of this example. The period is 2¹²⁸-1.
PCG - Another relatively new algorithm family that is very fast, very random, and appropriate for parallel computations. Implementations are available in C, C++, and Haskell on the PCG website. The period depends on the version used, their sample code includes 2⁶⁴-1 and 2¹²⁸-1 period generators. If you are interested in using this algorithm in your code and need some help, please contact RCS.

Seeding a RNG

Every computational RNG requires a starting value. There are many ways to generate random starting seeds. For an example of how to use the Linux /dev/random hardware RNG utility that is available on the SCC computers, see the C code example in this directory. Another way is to retrieve some seed values from hardware-based random number generators available on the Internet. Random.org has an HTTP-based API to pull random bytes based on atmospheric radio noise, and HotBits offers a web form to download random bytes based on radioactive decay.

Seeding from the current system clock value is another way to seed an RNG. However, if multiple processes (ex. MPI) or threads (ex. OpenMP) on a computer are seeding separate RNGs it is entirely possible for some or all of them to grab the same clock value, depending on the resolution of the clock and when the value is retrieved. A better approach is to seed a single RNG from the clock and then use its sequence of numbers to provide seeds to the different processes or threads. Another approach would be to use an RNG like the PCG or xoroshiro128+ generators that can be "jumped" ahead by a large amount (say 2⁶⁴ numbers) based on a process or thread id. This provides each process or thread with their own separate sequence of numbers based on a single seed value. For an example, see the C++ and C sample codes for the xoroshiro128+ generator in this directory.

Document written by Brian Gregor on 9/28/2016. Last modified on 9/29/2016.