A124015 thoughts

Sat Nov 4 23:17:45 CET 2006

	For anyone interested, this Mathematica function, eng[], will (as
usual, barring bugs) give the American English name for integers on 0 to
10^21-1.  Extension of the upper bound should be plain to implement
(just one line of code -- observe the trend).

	All printing is lowercase and without hyphens.  In holding that "and"
should only ever be read for radix points, it is never printed here.
Note this implementation uses a text postprocessor and so is inherently
ugly.  Also note that the helper memoizes for better or worse.  Please,
of course, report all bugs.

	Attached are some diagrams of the predictable distribution of
StringLength[eng[n]] (note this includes spaces).  The third ranges up
to 10^9, sampling every 10^6.

	Here is sample output from random naturals on 0,999999:

 40573, forty thousand five hundred seventy three
108097, one hundred eight thousand ninety seven
539799, five hundred thirty nine thousand seven hundred ninety nine
136868, one hundred thirty six thousand eight hundred sixty eight
597841, five hundred ninety seven thousand eight hundred forty one
272460, two hundred seventy two thousand four hundred sixty
135550, one hundred thirty five thousand five hundred fifty
402424, four hundred two thousand four hundred twenty four
543617, five hundred forty three thousand six hundred seventeen
868945, eight hundred sixty eight thousand nine hundred forty five
951779, nine hundred fifty one thousand seven hundred seventy nine
742845, seven hundred forty two thousand eight hundred forty five
 84140, eighty four thousand one hundred forty
680145, six hundred eighty thousand one hundred forty five
 82734, eighty two thousand seven hundred thirty four
154758, one hundred fifty four thousand seven hundred fifty eight
961084, nine hundred sixty one thousand eighty four
360944, three hundred sixty thousand nine hundred forty four
574416, five hundred seventy four thousand four hundred sixteen
164099, one hundred sixty four thousand ninety nine

	Here is the code:

Clear[eng, engRaw, n];
eng[n_] := (
  pre = engRaw[n];
  If[pre == "", pre = "zero"]; (* fixes "zero" *)
  pre = FixedPoint[StringReplace[#, "  " -> " "] &, pre] ; (* no
consecutive spaces *)
  pre = FixedPoint[StringReplace[#, RegularExpression[" $"] :> ""] &,
pre] ; (* no trailing spaces *)
  pre);
(* This raw function relies heavily on the postprocessor, which fixes
"zero" and deals with whitespace *)
engRaw[n_] := engRaw[n] = (* memo *)
  Which[
    n == 0, "",(* fixed in postprocessing *)
    n == 1, "one",
    n == 2, "two",
    n == 3, "three",
    n == 4, "four",
    n == 5, "five",
    n == 6, "six",
    n == 7, "seven",
    n == 8, "eight",
    n == 9, "nine",
    n == 10, "ten",
    n == 11, "eleven",
    n == 12, "twelve",
    n == 13, "thirteen",
    n == 14, "fourteen",
    n == 15, "fifteen",
    n == 16, "sixteen",
    n == 17, "seventeen",
    n == 18, "eighteen",
    n == 19, "nineteen",
    20 <= n <= 29, "twenty " <> engRaw[Mod[n, 10]],
    30 <= n <= 39, "thirty " <> engRaw[Mod[n, 10]],
    40 <= n <= 49, "forty " <> engRaw[Mod[n, 10]],
    50 <= n <= 59, "fifty " <> engRaw[Mod[n, 10]],
    60 <= n <= 69, "sixty " <> engRaw[Mod[n, 10]],
    70 <= n <= 79, "seventy " <> engRaw[Mod[n, 10]],
    80 <= n <= 89, "eighty " <> engRaw[Mod[n, 10]],
    90 <= n <= 99, "ninety " <> engRaw[Mod[n, 10]],
    10^2 <= n <= 10^3 - 1, engRaw[Quotient[n, 10^2]] <> " hundred " <>
engRaw[Mod[n, 10^2]],
    10^3 <= n <= 10^6 - 1, engRaw[Quotient[n, 10^3]] <> " thousand " <>
engRaw[Mod[n, 10^3]],
    10^6 <= n <= 10^9 - 1, engRaw[Quotient[n, 10^6]] <> " million " <>
engRaw[Mod[n, 10^6]],
    10^9 <= n <= 10^12 - 1, engRaw[Quotient[n, 10^9]] <> " billion " <>
engRaw[Mod[n, 10^9]],
    10^12 <= n <= 10^15 - 1, engRaw[Quotient[n, 10^12]] <> " trillion "
<> engRaw[Mod[n, 10^12]],
    10^15 <= n <= 10^18 - 1, engRaw[Quotient[n, 10^15]] <> " quadrillion
" <> engRaw[Mod[n, 10^15]],
    10^18 <= n <= 10^21 - 1, engRaw[Quotient[n, 10^18]] <> " quintillion
" <> engRaw[Mod[n, 10^18]],
    True, Return["Naturals beyond 10^21-1 are not supported"];];

-JRB

Jonathan Post wrote:
> Now that I think of it, logarithmic trends can be interesting.  How
> about "what is the distribution of the number of letters in the names of
> numbers from one to one billion?"
> 
> -- Jonathan Vos Post
> 
> On 11/3/06, *David Wilson* <davidwwilson at comcast.net
> <mailto:davidwwilson at comcast.net>> wrote:
> 
>     The description of A124015 is
>      
>     %N A124015 Number of words with n letters in the National Scrabble
>     Association Dictionary.
>      
>     Some notes:
>      
>     - It should be noted that the NSAD includes a restricted set of
>     English words (words of 2 to 14 letters, no proper nouns or
>     derivatives, no words with non-alphabetic characters (e.g,
>     contractions, hyphenated words), and it is this restricted set that
>     is being counted.
>      
>     - The edition of the NSAD used to create A124015 should be
>     specified, since the NSAD is continually edited.  While I do not
>     think that A124015 should change with each new edition of the NSAD,
>     because it may be referenced in other literature. On the other hand,
>     I don't think that a new sequence should be created for each new
>     edition of the NSAD unless the new edition exhibits some interesting
>     statistical departure from the current NSAD (e.g, in some future
>     NSAD, the number of 9-letter words exceeds the number of 8-letter
>     words). This is because I believe that the value of A124015 is
>     mainly to indicate a distribution of English word lengths.
>      
>     - On the other hand, I have always understood that 4-letter words
>     are most common in literature (where small words appear more often).
>     It might be interesting to create a sequence counting words of
>     length n in the King James Bible, War and Peace, or some other
>     suitably large and stable piece of English literature (we don't have
>     to go overboard on this, just one or two examples indicating word
>     length distributions in literature).
>      
>     - Another interesting idea: In various languages, what is the
>     distribution of the number of letters in the names of numbers from
>     one to (say) one million?
>      
>      
> 
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: EnglishNameDiagrams.PNG
Type: image/png
Size: 28453 bytes
Desc: not available
URL: <http://list.seqfan.eu/pipermail/seqfan/attachments/20061104/6d824ad8/attachment-0002.png>