Data Mining EIS

Joe Crump joecr at microsoft.com
Wed Jul 26 21:45:03 CEST 2000


I agree something like this would be nice.

I've "rolled my own" collection of all the sequences
in the past  to do things similar to Michael Somos.

What I'd like, in addition to a simple one-file
database of the sequences, is another one-file
database of all the unique numbers that make up
the sequences (sorted of course).

I've found that iterating the sequences in the database
often turns up useful solutions to problems (which may be
completely unrelated to what you're investigating!) :)

E.g.

	for(int i=0; i<maxnums; i++) 
		if( f(num[i]) == whatIWant) 
			report(num[i]);

- Joe

-----Original Message-----
From: Michael Somos [mailto:somos at grail.cba.csuohio.edu]
Sent: Wednesday, July 26, 2000 3:36 PM
To: seqfan at ext.jussieu.fr
Subject: Data Mining EIS


Seqfans!

I have come across a situation which may not be unique. I would like
to run an analysis on the numerical sequences in the entire EIS. For
each sequence I would like to do a few tests. This might involve applying
some transform to the original sequence to generate another sequence and
then seeing if that sequence is already in the database. I might also
want to check for duplicates or close duplicates. I might want to check
for complementary pairs. In short, the possibilities are unlimited, but
first I need to have the numerical data. All I would require is a simple
text file with one sequence per line like :

A000108 1,1,2,5,14,42,132,429,1430,4862,16796,58786,208012,742900,

with however many numerical values are given in the %S,%T,TU lines, and
for signed sequences the version in the %V,%W,%X also. I figure that the
bzipped file shold be about 4 MByte and could be put on the web. I think
other people might be interested to run thru all the sequences in this
way. For my purposes, it does not matter much if the data is not perfect.
That is, if any dead or incorrect data is included. All that matters is
that most of the sequences are included. I can always lookup individually
any sequences I am interested in studying further. How about it Neil?

Shalom, Michael

-- 
Michael Somos <somos at grail.cba.csuohio.edu>     Cleveland State University
http://grail.cba.csuohio.edu/~somos/            Cleveland, Ohio, USA 44115





More information about the SeqFan mailing list