[seqfan] Re: help needed with Mediawiki's Lucene-search extension

Joerg Arndt arndt at jjj.de
Sat Dec 19 07:06:44 CET 2009


* N. J. A. Sloane <njas at research.att.com> [Dec 19. 2009 14:03]:
> Joerg said:
> 
> > ...and it works phantastically well.
> If at all possible, I suggest keeping it
> for the search of sequences.
> 
> The trouble is, Russ Cox's search takes its data from 
> "cat25", which is the big flat file that contains 
> ALL the sequences, in the internal format.
> 
> wc cat25
>   2612489  21440539 176452802 cat25

approx 176MB (only)?

Easily fits into the buffer as long as the machine
has a decent amount of RAM.

As long as 1 seq is approx 1kB the mechanism
things should be fine even if we have 1 million seqs.


> 
> But once the wiki is stabilized, cat25 will go away.
> It would require a huge amount of work to modify
> Russ's program so that it takes its data from the 
> 170,000 individual wiki pages, all written in
> wiki language.  (Of course we considered this, 
> it was our first choice.  But it won't work.)

Make this a low priority job (ionice) in the background.
Even better if there is a CPU core for it.

I hope the machine has at least 8 GB of RAM,
the more the better;  ECC being a must.

> 
> Neil
> 

cheers,   jj




More information about the SeqFan mailing list