Data Mining EIS

Michael Somos somos at grail.cba.csuohio.edu
Wed Jul 26 21:35:56 CEST 2000


Seqfans!

I have come across a situation which may not be unique. I would like
to run an analysis on the numerical sequences in the entire EIS. For
each sequence I would like to do a few tests. This might involve applying
some transform to the original sequence to generate another sequence and
then seeing if that sequence is already in the database. I might also
want to check for duplicates or close duplicates. I might want to check
for complementary pairs. In short, the possibilities are unlimited, but
first I need to have the numerical data. All I would require is a simple
text file with one sequence per line like :

A000108 1,1,2,5,14,42,132,429,1430,4862,16796,58786,208012,742900,

with however many numerical values are given in the %S,%T,TU lines, and
for signed sequences the version in the %V,%W,%X also. I figure that the
bzipped file shold be about 4 MByte and could be put on the web. I think
other people might be interested to run thru all the sequences in this
way. For my purposes, it does not matter much if the data is not perfect.
That is, if any dead or incorrect data is included. All that matters is
that most of the sequences are included. I can always lookup individually
any sequences I am interested in studying further. How about it Neil?

Shalom, Michael

-- 
Michael Somos <somos at grail.cba.csuohio.edu>     Cleveland State University
http://grail.cba.csuohio.edu/~somos/            Cleveland, Ohio, USA 44115





More information about the SeqFan mailing list