Hugo Pfoertner all at abouthugo.de
Thu Jun 24 21:46:41 CEST 2004

y.kohmoto wrote:

>    Hugo, Jim.
>    Thank you very much for many informations about randomness.
>    I will read the description about DIEHARD.

Jim Nastos wrote:

>>   In addition to all of Hugo Pfoertner's suggestions, there is a neat
>> measure of randomness related to incompressibility (which has obvious
>> applications in the field of data compression.) This is commonly
>> discussed in the context of Kolmogorov complexity.
>   I was interested in these words "neat measure of randomness related to
> incompressibility".
>   I mean only the words, because I have no knowledge about deta
> compression at all.

One idea: Try to map a sequence whose "randomness" shall be measured on
a binary (text) file using a unique mapping.  Create another file using
the same mapping technique, but using some input generated from a source
that is known to produce generally accepted good random numbers
(e.g. use the Mersenne Twister RNG). Use various "state of the art"
data compression programs (see e.g. http://compression.ca/ ) and compare
the results. You need very long sequences to get conclusive results
(e.g. > 1Mbyte) file size of uncompressed files. If the file created
the sequence is better compressible than the "random reference" file,
then the compressiong algorithm has found something it can learn
skewness in probablities, autocorrelation, etc.). The best available
data compression programs have fairly sophisticated adaptive prediction
methods and try to create minimum-entropy codes, e.g. by adaptive
arithmetic coding.  Try to get a copy of the classic book

Text Compression (Prentice Hall Advanced Reference Series)
by Timothy C. Bell, John G. Cleary, Ian H. Witten 

which describes some of the underlying concepts of adaptive compression

A more recent book is 
Compression and Coding Algorithms by Alistair Moffat and Andrew Turpin,
Kluwer Academic Publishers, 2002

For much, much more material see Mark Nelson's

>   I want to know if some sequences which I am researching is random or
>  not.
>   And I want to know if my algorithm is good or not.

If you want to create a good pseudo random number generator,
(do you really want to compete with Marsaglia, Ecuyer, ...??) create
one million of 32bit unsigned integers, write it to a binary file and
throw it into DIEHARD. If your numbers pass all tests and are very easy
to create, your method might be eventually chosen in Microsoft's Excel
2025. I was bit of surprised, that MS now use a reasonable
generator in Excel 2003, after having ignored the complaints of the
whole statistical community for many years.


>    Is there any instant and easy method to test if a sequence has some
>  randomness?
>    Sorry, I might be saying an impossible thing.
>     Yasutoshi

More on "random" numbers in another post.

Hugo Pfoertner

More information about the SeqFan mailing list