[seqfan] Re: help needed with Mediawiki's Lucene-search extension
arndt at jjj.de
Sat Dec 19 07:06:44 CET 2009
* N. J. A. Sloane <njas at research.att.com> [Dec 19. 2009 14:03]:
> Joerg said:
> > ...and it works phantastically well.
> If at all possible, I suggest keeping it
> for the search of sequences.
> The trouble is, Russ Cox's search takes its data from
> "cat25", which is the big flat file that contains
> ALL the sequences, in the internal format.
> wc cat25
> 2612489 21440539 176452802 cat25
approx 176MB (only)?
Easily fits into the buffer as long as the machine
has a decent amount of RAM.
As long as 1 seq is approx 1kB the mechanism
things should be fine even if we have 1 million seqs.
> But once the wiki is stabilized, cat25 will go away.
> It would require a huge amount of work to modify
> Russ's program so that it takes its data from the
> 170,000 individual wiki pages, all written in
> wiki language. (Of course we considered this,
> it was our first choice. But it won't work.)
Make this a low priority job (ionice) in the background.
Even better if there is a CPU core for it.
I hope the machine has at least 8 GB of RAM,
the more the better; ECC being a must.
More information about the SeqFan