Nice code.  And what fascinating graphs! They have both

periodicities and a fractal appearance.  Somehwere in there is the

equivalent of Benford's Law, and Zipf's Law, on lengths names of

numbers (as opposed to digit distribution and power law).<br><br><div><span class="gmail_quote">On 11/4/06, <b class="gmail_sendername">Joseph Biberstine</b> <<a href="mailto:jrbibers@indiana.edu">jrbibers@indiana.edu

</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">        For anyone interested, this Mathematica function, eng[], will (as

<br>usual, barring bugs) give the American English name for integers on 0 to<br>10^21-1.  Extension of the upper bound should be plain to implement<br>(just one line of code -- observe the trend).<br><br>        All printing is lowercase and without hyphens.  In holding that "and"

<br>should only ever be read for radix points, it is never printed here.<br>Note this implementation uses a text postprocessor and so is inherently<br>ugly.  Also note that the helper memoizes for better or worse.  Please,

<br>of course, report all bugs.<br><br>        Attached are some diagrams of the predictable distribution of<br>StringLength[eng[n]] (note this includes spaces).  The third ranges up<br>to 10^9, sampling every 10^6.<br><br>

        Here is sample output from random naturals on 0,999999:<br><br> 40573, forty thousand five hundred seventy three<br>108097, one hundred eight thousand ninety seven<br>539799, five hundred thirty nine thousand seven hundred ninety nine

<br>136868, one hundred thirty six thousand eight hundred sixty eight<br>597841, five hundred ninety seven thousand eight hundred forty one<br>272460, two hundred seventy two thousand four hundred sixty<br>135550, one hundred thirty five thousand five hundred fifty

<br>402424, four hundred two thousand four hundred twenty four<br>543617, five hundred forty three thousand six hundred seventeen<br>868945, eight hundred sixty eight thousand nine hundred forty five<br>951779, nine hundred fifty one thousand seven hundred seventy nine

<br>742845, seven hundred forty two thousand eight hundred forty five<br> 84140, eighty four thousand one hundred forty<br>680145, six hundred eighty thousand one hundred forty five<br> 82734, eighty two thousand seven hundred thirty four

<br>154758, one hundred fifty four thousand seven hundred fifty eight<br>961084, nine hundred sixty one thousand eighty four<br>360944, three hundred sixty thousand nine hundred forty four<br>574416, five hundred seventy four thousand four hundred sixteen

<br>164099, one hundred sixty four thousand ninety nine<br><br>        Here is the code:<br><br>Clear[eng, engRaw, n];<br>eng[n_] := (<br>  pre = engRaw[n];<br>  If[pre == "", pre = "zero"]; (* fixes "zero" *)

<br>  pre = FixedPoint[StringReplace[#, "  " -> " "] &, pre] ; (* no<br>consecutive spaces *)<br>  pre = FixedPoint[StringReplace[#, RegularExpression[" $"] :> ""] &,<br>

pre] ; (* no trailing spaces *)<br>  pre);<br>(* This raw function relies heavily on the postprocessor, which fixes<br>"zero" and deals with whitespace *)<br>engRaw[n_] := engRaw[n] = (* memo *)<br>  Which[<br>    n == 0, "",(* fixed in postprocessing *)

<br>    n == 1, "one",<br>    n == 2, "two",<br>    n == 3, "three",<br>    n == 4, "four",<br>    n == 5, "five",<br>    n == 6, "six",<br>    n == 7, "seven",

<br>    n == 8, "eight",<br>    n == 9, "nine",<br>    n == 10, "ten",<br>    n == 11, "eleven",<br>    n == 12, "twelve",<br>    n == 13, "thirteen",<br>    n == 14, "fourteen",

<br>    n == 15, "fifteen",<br>    n == 16, "sixteen",<br>    n == 17, "seventeen",<br>    n == 18, "eighteen",<br>    n == 19, "nineteen",<br>    20 <= n <= 29, "twenty " <> engRaw[Mod[n, 10]],

<br>    30 <= n <= 39, "thirty " <> engRaw[Mod[n, 10]],<br>    40 <= n <= 49, "forty " <> engRaw[Mod[n, 10]],<br>    50 <= n <= 59, "fifty " <> engRaw[Mod[n, 10]],

<br>    60 <= n <= 69, "sixty " <> engRaw[Mod[n, 10]],<br>    70 <= n <= 79, "seventy " <> engRaw[Mod[n, 10]],<br>    80 <= n <= 89, "eighty " <> engRaw[Mod[n, 10]],

<br>    90 <= n <= 99, "ninety " <> engRaw[Mod[n, 10]],<br>    10^2 <= n <= 10^3 - 1, engRaw[Quotient[n, 10^2]] <> " hundred " <><br>engRaw[Mod[n, 10^2]],<br>    10^3 <= n <= 10^6 - 1, engRaw[Quotient[n, 10^3]] <> " thousand " <>

<br>engRaw[Mod[n, 10^3]],<br>    10^6 <= n <= 10^9 - 1, engRaw[Quotient[n, 10^6]] <> " million " <><br>engRaw[Mod[n, 10^6]],<br>    10^9 <= n <= 10^12 - 1, engRaw[Quotient[n, 10^9]] <> " billion " <>

<br>engRaw[Mod[n, 10^9]],<br>    10^12 <= n <= 10^15 - 1, engRaw[Quotient[n, 10^12]] <> " trillion "<br><> engRaw[Mod[n, 10^12]],<br>    10^15 <= n <= 10^18 - 1, engRaw[Quotient[n, 10^15]] <> " quadrillion

<br>" <> engRaw[Mod[n, 10^15]],<br>    10^18 <= n <= 10^21 - 1, engRaw[Quotient[n, 10^18]] <> " quintillion<br>" <> engRaw[Mod[n, 10^18]],<br>    True, Return["Naturals beyond 10^21-1 are not supported"];];

<br><br>-JRB<br><br>Jonathan Post wrote:<br>> Now that I think of it, logarithmic trends can be interesting.  How<br>> about "what is the distribution of the number of letters in the names of<br>> numbers from one to one billion?"

<br>><br>> -- Jonathan Vos Post<br>><br>> On 11/3/06, *David Wilson* <<a href="mailto:davidwwilson@comcast.net">davidwwilson@comcast.net</a><br>> <mailto:<a href="mailto:davidwwilson@comcast.net">davidwwilson@comcast.net

</a>>> wrote:<br>><br>>     The description of A124015 is<br>><br>>     %N A124015 Number of words with n letters in the National Scrabble<br>>     Association Dictionary.<br>><br>>     Some notes:

<br>><br>>     - It should be noted that the NSAD includes a restricted set of<br>>     English words (words of 2 to 14 letters, no proper nouns or<br>>     derivatives, no words with non-alphabetic characters (

e.g,<br>>     contractions, hyphenated words), and it is this restricted set that<br>>     is being counted.<br>><br>>     - The edition of the NSAD used to create A124015 should be<br>>     specified, since the NSAD is continually edited.  While I do not

<br>>     think that A124015 should change with each new edition of the NSAD,<br>>     because it may be referenced in other literature. On the other hand,<br>>     I don't think that a new sequence should be created for each new

<br>>     edition of the NSAD unless the new edition exhibits some interesting<br>>     statistical departure from the current NSAD (e.g, in some future<br>>     NSAD, the number of 9-letter words exceeds the number of 8-letter

<br>>     words). This is because I believe that the value of A124015 is<br>>     mainly to indicate a distribution of English word lengths.<br>><br>>     - On the other hand, I have always understood that 4-letter words

<br>>     are most common in literature (where small words appear more often).<br>>     It might be interesting to create a sequence counting words of<br>>     length n in the King James Bible, War and Peace, or some other

<br>>     suitably large and stable piece of English literature (we don't have<br>>     to go overboard on this, just one or two examples indicating word<br>>     length distributions in literature).<br>><br>>     - Another interesting idea: In various languages, what is the

<br>>     distribution of the number of letters in the names of numbers from<br>>     one to (say) one million?<br>><br>><br>><br>><br><br><br></blockquote></div><br>