A124015 thoughts
Jonathan Post
jvospost3 at gmail.com
Sun Nov 5 02:05:28 CET 2006
Nice code. And what fascinating graphs! They have both periodicities and a
fractal appearance. Somehwere in there is the equivalent of Benford's Law,
and Zipf's Law, on lengths names of numbers (as opposed to digit
distribution and power law).
On 11/4/06, Joseph Biberstine <jrbibers at indiana.edu> wrote:
>
> For anyone interested, this Mathematica function, eng[], will (as
> usual, barring bugs) give the American English name for integers on 0 to
> 10^21-1. Extension of the upper bound should be plain to implement
> (just one line of code -- observe the trend).
>
> All printing is lowercase and without hyphens. In holding that
> "and"
> should only ever be read for radix points, it is never printed here.
> Note this implementation uses a text postprocessor and so is inherently
> ugly. Also note that the helper memoizes for better or worse. Please,
> of course, report all bugs.
>
> Attached are some diagrams of the predictable distribution of
> StringLength[eng[n]] (note this includes spaces). The third ranges up
> to 10^9, sampling every 10^6.
>
> Here is sample output from random naturals on 0,999999:
>
> 40573, forty thousand five hundred seventy three
> 108097, one hundred eight thousand ninety seven
> 539799, five hundred thirty nine thousand seven hundred ninety nine
> 136868, one hundred thirty six thousand eight hundred sixty eight
> 597841, five hundred ninety seven thousand eight hundred forty one
> 272460, two hundred seventy two thousand four hundred sixty
> 135550, one hundred thirty five thousand five hundred fifty
> 402424, four hundred two thousand four hundred twenty four
> 543617, five hundred forty three thousand six hundred seventeen
> 868945, eight hundred sixty eight thousand nine hundred forty five
> 951779, nine hundred fifty one thousand seven hundred seventy nine
> 742845, seven hundred forty two thousand eight hundred forty five
> 84140, eighty four thousand one hundred forty
> 680145, six hundred eighty thousand one hundred forty five
> 82734, eighty two thousand seven hundred thirty four
> 154758, one hundred fifty four thousand seven hundred fifty eight
> 961084, nine hundred sixty one thousand eighty four
> 360944, three hundred sixty thousand nine hundred forty four
> 574416, five hundred seventy four thousand four hundred sixteen
> 164099, one hundred sixty four thousand ninety nine
>
> Here is the code:
>
> Clear[eng, engRaw, n];
> eng[n_] := (
> pre = engRaw[n];
> If[pre == "", pre = "zero"]; (* fixes "zero" *)
> pre = FixedPoint[StringReplace[#, " " -> " "] &, pre] ; (* no
> consecutive spaces *)
> pre = FixedPoint[StringReplace[#, RegularExpression[" $"] :> ""] &,
> pre] ; (* no trailing spaces *)
> pre);
> (* This raw function relies heavily on the postprocessor, which fixes
> "zero" and deals with whitespace *)
> engRaw[n_] := engRaw[n] = (* memo *)
> Which[
> n == 0, "",(* fixed in postprocessing *)
> n == 1, "one",
> n == 2, "two",
> n == 3, "three",
> n == 4, "four",
> n == 5, "five",
> n == 6, "six",
> n == 7, "seven",
> n == 8, "eight",
> n == 9, "nine",
> n == 10, "ten",
> n == 11, "eleven",
> n == 12, "twelve",
> n == 13, "thirteen",
> n == 14, "fourteen",
> n == 15, "fifteen",
> n == 16, "sixteen",
> n == 17, "seventeen",
> n == 18, "eighteen",
> n == 19, "nineteen",
> 20 <= n <= 29, "twenty " <> engRaw[Mod[n, 10]],
> 30 <= n <= 39, "thirty " <> engRaw[Mod[n, 10]],
> 40 <= n <= 49, "forty " <> engRaw[Mod[n, 10]],
> 50 <= n <= 59, "fifty " <> engRaw[Mod[n, 10]],
> 60 <= n <= 69, "sixty " <> engRaw[Mod[n, 10]],
> 70 <= n <= 79, "seventy " <> engRaw[Mod[n, 10]],
> 80 <= n <= 89, "eighty " <> engRaw[Mod[n, 10]],
> 90 <= n <= 99, "ninety " <> engRaw[Mod[n, 10]],
> 10^2 <= n <= 10^3 - 1, engRaw[Quotient[n, 10^2]] <> " hundred " <>
> engRaw[Mod[n, 10^2]],
> 10^3 <= n <= 10^6 - 1, engRaw[Quotient[n, 10^3]] <> " thousand " <>
> engRaw[Mod[n, 10^3]],
> 10^6 <= n <= 10^9 - 1, engRaw[Quotient[n, 10^6]] <> " million " <>
> engRaw[Mod[n, 10^6]],
> 10^9 <= n <= 10^12 - 1, engRaw[Quotient[n, 10^9]] <> " billion " <>
> engRaw[Mod[n, 10^9]],
> 10^12 <= n <= 10^15 - 1, engRaw[Quotient[n, 10^12]] <> " trillion "
> <> engRaw[Mod[n, 10^12]],
> 10^15 <= n <= 10^18 - 1, engRaw[Quotient[n, 10^15]] <> " quadrillion
> " <> engRaw[Mod[n, 10^15]],
> 10^18 <= n <= 10^21 - 1, engRaw[Quotient[n, 10^18]] <> " quintillion
> " <> engRaw[Mod[n, 10^18]],
> True, Return["Naturals beyond 10^21-1 are not supported"];];
>
> -JRB
>
> Jonathan Post wrote:
> > Now that I think of it, logarithmic trends can be interesting. How
> > about "what is the distribution of the number of letters in the names of
> > numbers from one to one billion?"
> >
> > -- Jonathan Vos Post
> >
> > On 11/3/06, *David Wilson* <davidwwilson at comcast.net
> > <mailto:davidwwilson at comcast.net>> wrote:
> >
> > The description of A124015 is
> >
> > %N A124015 Number of words with n letters in the National Scrabble
> > Association Dictionary.
> >
> > Some notes:
> >
> > - It should be noted that the NSAD includes a restricted set of
> > English words (words of 2 to 14 letters, no proper nouns or
> > derivatives, no words with non-alphabetic characters (e.g,
> > contractions, hyphenated words), and it is this restricted set that
> > is being counted.
> >
> > - The edition of the NSAD used to create A124015 should be
> > specified, since the NSAD is continually edited. While I do not
> > think that A124015 should change with each new edition of the NSAD,
> > because it may be referenced in other literature. On the other hand,
> > I don't think that a new sequence should be created for each new
> > edition of the NSAD unless the new edition exhibits some interesting
> > statistical departure from the current NSAD (e.g, in some future
> > NSAD, the number of 9-letter words exceeds the number of 8-letter
> > words). This is because I believe that the value of A124015 is
> > mainly to indicate a distribution of English word lengths.
> >
> > - On the other hand, I have always understood that 4-letter words
> > are most common in literature (where small words appear more often).
> > It might be interesting to create a sequence counting words of
> > length n in the King James Bible, War and Peace, or some other
> > suitably large and stable piece of English literature (we don't have
> > to go overboard on this, just one or two examples indicating word
> > length distributions in literature).
> >
> > - Another interesting idea: In various languages, what is the
> > distribution of the number of letters in the names of numbers from
> > one to (say) one million?
> >
> >
> >
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.seqfan.eu/pipermail/seqfan/attachments/20061104/15dfa699/attachment-0003.htm>
More information about the SeqFan
mailing list