A124015 thoughts

Sun Nov 5 02:05:28 CET 2006

Nice code.  And what fascinating graphs! They have both periodicities and a
fractal appearance.  Somehwere in there is the equivalent of Benford's Law,
and Zipf's Law, on lengths names of numbers (as opposed to digit
distribution and power law).

On 11/4/06, Joseph Biberstine <jrbibers at indiana.edu> wrote:
>
>         For anyone interested, this Mathematica function, eng[], will (as
> usual, barring bugs) give the American English name for integers on 0 to
> 10^21-1.  Extension of the upper bound should be plain to implement
> (just one line of code -- observe the trend).
>
>         All printing is lowercase and without hyphens.  In holding that
> "and"
> should only ever be read for radix points, it is never printed here.
> Note this implementation uses a text postprocessor and so is inherently
> ugly.  Also note that the helper memoizes for better or worse.  Please,
> of course, report all bugs.
>
>         Attached are some diagrams of the predictable distribution of
> StringLength[eng[n]] (note this includes spaces).  The third ranges up
> to 10^9, sampling every 10^6.
>
>         Here is sample output from random naturals on 0,999999:
>
> 40573, forty thousand five hundred seventy three
> 108097, one hundred eight thousand ninety seven
> 539799, five hundred thirty nine thousand seven hundred ninety nine
> 136868, one hundred thirty six thousand eight hundred sixty eight
> 597841, five hundred ninety seven thousand eight hundred forty one
> 272460, two hundred seventy two thousand four hundred sixty
> 135550, one hundred thirty five thousand five hundred fifty
> 402424, four hundred two thousand four hundred twenty four
> 543617, five hundred forty three thousand six hundred seventeen
> 868945, eight hundred sixty eight thousand nine hundred forty five
> 951779, nine hundred fifty one thousand seven hundred seventy nine
> 742845, seven hundred forty two thousand eight hundred forty five
> 84140, eighty four thousand one hundred forty
> 680145, six hundred eighty thousand one hundred forty five
> 82734, eighty two thousand seven hundred thirty four
> 154758, one hundred fifty four thousand seven hundred fifty eight
> 961084, nine hundred sixty one thousand eighty four
> 360944, three hundred sixty thousand nine hundred forty four
> 574416, five hundred seventy four thousand four hundred sixteen
> 164099, one hundred sixty four thousand ninety nine
>
>         Here is the code:
>
> Clear[eng, engRaw, n];
> eng[n_] := (
>   pre = engRaw[n];
>   If[pre == "", pre = "zero"]; (* fixes "zero" *)
>   pre = FixedPoint[StringReplace[#, "  " -> " "] &, pre] ; (* no
> consecutive spaces *)
>   pre = FixedPoint[StringReplace[#, RegularExpression[" $"] :> ""] &,
> pre] ; (* no trailing spaces *)
>   pre);
> (* This raw function relies heavily on the postprocessor, which fixes
> "zero" and deals with whitespace *)
> engRaw[n_] := engRaw[n] = (* memo *)
>   Which[
>     n == 0, "",(* fixed in postprocessing *)
>     n == 1, "one",
>     n == 2, "two",
>     n == 3, "three",
>     n == 4, "four",
>     n == 5, "five",
>     n == 6, "six",
>     n == 7, "seven",
>     n == 8, "eight",
>     n == 9, "nine",
>     n == 10, "ten",
>     n == 11, "eleven",
>     n == 12, "twelve",
>     n == 13, "thirteen",
>     n == 14, "fourteen",
>     n == 15, "fifteen",
>     n == 16, "sixteen",
>     n == 17, "seventeen",
>     n == 18, "eighteen",
>     n == 19, "nineteen",
>     20 <= n <= 29, "twenty " <> engRaw[Mod[n, 10]],
>     30 <= n <= 39, "thirty " <> engRaw[Mod[n, 10]],
>     40 <= n <= 49, "forty " <> engRaw[Mod[n, 10]],
>     50 <= n <= 59, "fifty " <> engRaw[Mod[n, 10]],
>     60 <= n <= 69, "sixty " <> engRaw[Mod[n, 10]],
>     70 <= n <= 79, "seventy " <> engRaw[Mod[n, 10]],
>     80 <= n <= 89, "eighty " <> engRaw[Mod[n, 10]],
>     90 <= n <= 99, "ninety " <> engRaw[Mod[n, 10]],
>     10^2 <= n <= 10^3 - 1, engRaw[Quotient[n, 10^2]] <> " hundred " <>
> engRaw[Mod[n, 10^2]],
>     10^3 <= n <= 10^6 - 1, engRaw[Quotient[n, 10^3]] <> " thousand " <>
> engRaw[Mod[n, 10^3]],
>     10^6 <= n <= 10^9 - 1, engRaw[Quotient[n, 10^6]] <> " million " <>
> engRaw[Mod[n, 10^6]],
>     10^9 <= n <= 10^12 - 1, engRaw[Quotient[n, 10^9]] <> " billion " <>
> engRaw[Mod[n, 10^9]],
>     10^12 <= n <= 10^15 - 1, engRaw[Quotient[n, 10^12]] <> " trillion "
> <> engRaw[Mod[n, 10^12]],
>     10^15 <= n <= 10^18 - 1, engRaw[Quotient[n, 10^15]] <> " quadrillion
> " <> engRaw[Mod[n, 10^15]],
>     10^18 <= n <= 10^21 - 1, engRaw[Quotient[n, 10^18]] <> " quintillion
> " <> engRaw[Mod[n, 10^18]],
>     True, Return["Naturals beyond 10^21-1 are not supported"];];
>
> -JRB
>
> Jonathan Post wrote:
> > Now that I think of it, logarithmic trends can be interesting.  How
> > about "what is the distribution of the number of letters in the names of
> > numbers from one to one billion?"
> >
> > -- Jonathan Vos Post
> >
> > On 11/3/06, *David Wilson* <davidwwilson at comcast.net
> > <mailto:davidwwilson at comcast.net>> wrote:
> >
> >     The description of A124015 is
> >
> >     %N A124015 Number of words with n letters in the National Scrabble
> >     Association Dictionary.
> >
> >     Some notes:
> >
> >     - It should be noted that the NSAD includes a restricted set of
> >     English words (words of 2 to 14 letters, no proper nouns or
> >     derivatives, no words with non-alphabetic characters (e.g,
> >     contractions, hyphenated words), and it is this restricted set that
> >     is being counted.
> >
> >     - The edition of the NSAD used to create A124015 should be
> >     specified, since the NSAD is continually edited.  While I do not
> >     think that A124015 should change with each new edition of the NSAD,
> >     because it may be referenced in other literature. On the other hand,
> >     I don't think that a new sequence should be created for each new
> >     edition of the NSAD unless the new edition exhibits some interesting
> >     statistical departure from the current NSAD (e.g, in some future
> >     NSAD, the number of 9-letter words exceeds the number of 8-letter
> >     words). This is because I believe that the value of A124015 is
> >     mainly to indicate a distribution of English word lengths.
> >
> >     - On the other hand, I have always understood that 4-letter words
> >     are most common in literature (where small words appear more often).
> >     It might be interesting to create a sequence counting words of
> >     length n in the King James Bible, War and Peace, or some other
> >     suitably large and stable piece of English literature (we don't have
> >     to go overboard on this, just one or two examples indicating word
> >     length distributions in literature).
> >
> >     - Another interesting idea: In various languages, what is the
> >     distribution of the number of letters in the names of numbers from
> >     one to (say) one million?
> >
> >
> >
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.seqfan.eu/pipermail/seqfan/attachments/20061104/15dfa699/attachment-0003.htm>