Search question

cino hilliard hillcino368 at hotmail.com
Sun Jan 15 07:04:55 CET 2006


Hi Frank et al,

>From: Russ Cox <rsc at swtch.com>
>To: "franktaw at netscape.net" <franktaw at netscape.net>
>CC: seqfan at ext.jussieu.fr
>Subject: Re: Search question
>Date: Sat, 14 Jan 2006 14:30:09 -0500
>
> > I am currently trying to identify sequences that need more entries, but 
>do
> > not have the "more" (or "full") keyword.  I can search for 
>"-keyword:more
> > -keyword:full", but I would like to add to that a search for the absence 
>of
> > a "%U" line.  Is there any way I can do this, or that it could easily be
> > enabled?
>
>There isn't a way to do this; the machinery treats all
>sequence data lines the same.
>
>Russ
>

This can be done by downloading the 114 parts of the database and mining 
what you want
from it with software. I have written  Bcx ( C )  programs that do this. 
Extracting the parts takes 1
minute with a cable connection.

Here are some examples of sequences that have only one line of data by 
criterion "keyword."

Keyword: dumb
A004740,A019440,A026081,A048659,A055200,A058230,A059916,A059969,A082390,A084912,
A085808,A101944,A102701,A102705,A103132,A107081,A108159,A111070,A111157,A111198,
A112733,A112747,A112748,A112749,A112750,A112766,A112767,A112782,A112783,A112784,
A112785,A112786,A113172,
Total 33 one liners out of  63 dumb sequences

Keyword: hard
A000066,A000162,A000236,A000341,A000348,A000372,A000375,A000376,A000403,A000438,
A000474,A000509,A000510,A000512,A000528,A000530,A000532,A000609,A000637,A000638,
A000679,A000769,A000789,A000791,A000882,A000937,A000952,A000983,A001071,A001072,
....
....
A112245,A112284,A112535,A112548,A112723,A112724,A112741,A112853,A112855,A112874,
A112879,A112880,A113276,A113457,A113459,A113461,A114601,A114628,A114629,A114630,
A114631,A114632,A114648,A114649,A114665,A114670,A114676,A114714,A114716,A087306,
Total 1690 one liners out of  2439 hard sequences

Some more statistics

keyword nice
6049 nice Sequences
939 nice With only one line of terms

keyword uned
956 uned Sequences
251 uned With only one line of terms

keyword easy
28794 easy Sequences
1210 easy With only one line of terms

keyword nonn
106689 nonn Sequences
13717 nonn With only one line of terms

keyword sign
6909 sign Sequences
234 sign With only one line of terms

Etc Etc

Tell me which files you want or whatever you want and I will try to extract 
it and send to you
direct. Gotta lot of time! However, I woul like to just send sequence 
numbers and maybe the
definition.

Here is a great oppurtunity for some dumb sequences. :-)

I like your idea of checking short sequences for possible extension. We 
could of course do a
similar run for sequences with only <3  n < k terms  in the first row to 
narrow it down more.

The sequence list is a 'living" entity. These are just some of the things we 
can do to maintain
its vitality.

Have fun,
Cino Hilliard







More information about the SeqFan mailing list