ISDB in MySQL

David Wilson davidwwilson at attbi.com
Mon Jun 23 01:07:51 CEST 2003


Well, I have managed to convert the text ISDB into a MySQL database.

The conversion process is as follows:  I use a Perl program to convert the
text ISDB into MySQL import files.  Then I use a "mysql" script to initialize
the MySQL database, and "mysqlimport" to import the data. It takes about
5 minutes to convert the entire database, and I haven't even got the process
fully automated.

At first, I tried to do some data massaging during the process (e.g, I tried
to parse the programming language out of the %o lines).  After flailing
around a while, I realized that these issues would be much easier to
contend with once the database was converted to MySQL.  Therefore I
did very little data massaging.  The only significant massaging was
replacing %STUVWX lines with a single entry that included a complete
list of signed elements.

I have tested accessing the database via the "mysql" utility and via
Perl DBD::mysql.  Admittedly, the database is on the local host, but the
results are quite pleasing.  Querying for sequence info by A-number
turns around almost instantly, and querying for entries by embedded
words is very fast.  I am just at the beginning of my experiments, but I
think that the existing search capabilities of the text ISDB can be
easily duplicated.  Indeed, more advanced search options, such as
searching specific entries, omitting specific entries from the search,
or restricting entries in the search output are easily implemented.

My next goal is to write a Perl script that sorts the sequences according
to the canonical method, then assign integer sort keys to each sequence
in canonical order.  This will enable subsequent MySQL commands to
sort return values in canonical order without having to compare elements.

Having the ISDB in MySQL format also makes it amenable to various
web techs, such as Perl DBI+CGI and PHP, which have full support
for MySQL queries.  I plan to experiment with some of these.








More information about the SeqFan mailing list