My tool for studying the stripped.gz database
Thomas Baruchel
thomas.baruchel at laposte.net
Tue Sep 30 21:38:23 CEST 2003
Hi,
the 17th of september 2003, I wrote a message about an idea
of libguile-based tool for studying the stripped.gz version
of Sloane's database. I have not much time :-( but here is
what I already wrote. It is a scheme interpreter using
libguile, which provides Scheme functions for finding
relations between sequences of the database.
You can download it at
http://baruchel.thomas.free.fr/eisseeker.tar.gz
(initial release 0.1)
The first experiments I tried give good results,
parsing is speed enough for my 1.7Mhz; a gz compressed
files parser allow to use a small 6Mb memory disk
(but since I have enough RAM, I use a 24Mb memory
disk and work on an uncompressed version of the database).
I will not make noise in the seqfan mailing list with
it; I created a mailing list to which you can subscribe if
you are interested. I hope this long message will not
disturb people, but I will not send it again. Thank
you for being comprehensive. You can find informations
about the mailing list below or with the package.
Here is the current manual page, and very below is a script
file with relations detected. I didn't even check for
the interest of these relations, since my current work
only concerns writing the source code, but I send it
anyway. Probably 3/4 of the relations is not really
significant, but I would be proud if 2 or 3 relations
is interesting enough to be mentionned as a comment in
the on-line database.
Cordially,
Thomas baruchel
EISSEEKER(1) EISSEEKER(1)
NAME
eisseeker - guile-based shell for accessing the Encyclopedia of Integer
Sequences and looking for relations between sequences
SYNOPSIS
eisseeker [OPTION] ...
DESCRIPTION
The guile-based shell provides high-level functions in order to access
the Encyclopedia of Integer Sequences on a local file system and look-
ing for relations between sequences. The database is available as a
compressed archive at
http://www.research.att.com/~njas/sequences/Seis.html#FULL (name of the
file is stripped.gz). It is a good idea to copy it on a memory file
system before launching the program. The copy can either be compressed
(.gz format) or uncompressed, since eisseeker can use the zlib library
to access it if needed.
OPTIONS
Options are exactly the same as for any guile shell. Try
eisseeker -h or eisseeker --help
to get the list of options.
Much information is available about guile through the info (1) program.
INSTALLATION
You need to have liguile installed on your system. You may enable or
disable several options at compilation time. Edit the source code, and
remove or add the following lines at the very beginning of the file:
WITH_ZLIB Use zlib to read a .gz compressed database. If this options
is used, don't forget to add -lz at compilation time.
WITH_DEBUG Main purpose is to display how much time was needed by a
loop. It will be very confusing to enable this option at
research time, since several loops will be very probably nested,
but it may be useful to have this information for some specifi-
cal situations.
Then compile the program by linking it with libguile and with the other
libraries that have been declared at the beginning of the file. For
instance:
gcc -I/usr/local/include/ -L/usr/local/lib/ -lz -lguile -Wall -O3 -ansi
-o eisseeker eisseeker.c
NEW SCHEME FUNCTIONS
The shell provides several new Scheme functions in order to access the
database:
eis-seeker-version
This is a symbol defined by the program. The symbol is bound to
the current version of EISSeeker. It is intended to allow cus-
tomization of your ~/.guile file without disturbing other guile
shells. You can put in your configuration file:
(use-modules (ice-9 readline)) ; use readline library
(activate-readline) ; activate readline
(use-modules (ice-9 slib)) ; use SLIB standard library
(if (defined? 'eis-seeker-version) ; this only concern eisseeker
(begin ...))
or (alternate version):
(use-modules (ice-9 readline)) ; use readline library
(activate-readline) ; activate readline
(use-modules (ice-9 slib)) ; use SLIB standard library
(if (false-if-exception (eval 'eis-seeker-version (current-module)))
(begin ...))
You may use this feature for setting the location of the data-
base on your system, to choose which parser is needed, etc.
(eis-mount-database name)
Set the new database to be read to file called name (which has
to be a string).
(eis-umount-database)
Unmount the previously mounted database.
(eis-set-parser parser)
Set the current parser to parser, which has to be a symbol like
'basic or 'gz. Note that 'gz can read either compressed or
uncompressed files.
(eis-for-each proc)
Successively apply proc to each sequence of the database. The
sequences are list objects with an initial item being a symbol
giving the ID of the sequence. Thus the sequence of numbers is
the cdr of the argument received by proc. For instance: (A070214
1 2 5 8 11). All functions providing such loops may be nested.
(eis-for-each-map proc1 proc2)
Low-level implementation of:
(eis-for-each
(lambda (l)
(proc2 (cons (car l) (map proc1 (cdr l))))))
(eis-for-each-test proc1 proc2)
Low-level implementation of:
(eis-for-each
(lambda (l)
(if (proc1 (cdr l)) (proc2 l))))
(eis-for-each-test-map proc1 proc2 proc3)
Low-level implementation of:
(eis-for-each
(lambda (l)
(if (proc1 (cdr l))
(proc3 (cons (car l) (map proc2 (cdr l))))))
(eis-for-each-filter proc1 proc2)
Low-level implementation of:
(eis-for-each
(lambda (l)
(let ((n (proc1 (cdr l))))
(if n (if (not (zero? (length n)))
(proc2 (cons (car l) n)))))))
To be more precise, proc1 is a filter, which means a Scheme pro-
cedure returning either the false boolean value or any transfor-
mation of the sequence. If the filter applied to the sequence
returns false or an empty list, any further process concerning
that sequence is skipped. If the filter returns a valid list,
then proc2 is applied to the transformed list.
(eis-for-each-filter-reparse filter)
Each sequence in the database is read, and filter is applied to
its cdr (in order to remove the ID symbol). If not discarded
(see (eis-for-each-filter)), the filtered sequence is searched
in the database. A verbose output tells the result of the com-
parison. This function allows to systematically looks for trans-
formed sequences in the database. If a transformed sequence
match its initial version, it is skipped in order to avoid too
much noise in the output. Length of the transformed sequence is
displayed in order to help removing unsignificant results.
(eis-for-each-symbolic-filter symbol proc)
Low-level implementation of (eis-for-each-filter ...). It is
much quicker, but the filter has to be hacked in the source
code. This feature is provided in order to discard sequences
that can very easely be scanned in their initial string version
(for instance when considering binary sequences, base notation,
length, etc.). It is much quicker than the other version. If you
want to hack the file, see Hacker'S CORNER. If you don't want,
you can see in the list of predefined symbols if one matches
your needs:
'is-binary? Only keep binary sequences (which is not the same
that base 2 sequences). All numbers of the sequences must be
either 0 or 1.
'is-no-binary? Obvious.
'is-sign? Only keep sequences with at least one negative num-
ber.
'is-nonn? Only keep sequences with no negative number.
'is-base2? Obvious.
'is-no-base2? Obvious.
'is-base3? Obvious.
'is-no-base3? Obvious.
'is-base4? Obvious.
'is-no-base4? Obvious.
'is-base5? Obvious.
'is-no-base5? Obvious.
'is-base6? Obvious.
'is-no-base6? Obvious.
'is-base7? Obvious.
'is-no-base7? Obvious.
'is-base8? Obvious.
'is-no-base8? Obvious.
'is-base9? Obvious.
'is-no-base9? Obvious.
'is-even? All numbers must be even.
'is-odd? All numbers must be odd.
'is-alternate-odd-even? Obvious.
'has-single? Length of the sequence is 1.
'remove-even Transform the sequence by removing even numbers.
Why no more symbolic filters? In fact I would like my program to pro-
vide houndreds of such filters, since they really are much quicker, but
I am not really interested in writing them, though they are really very
easy to implement. I got bored after having written the last one, and I
will perhaps add some more later. But if you need a specific filter
that seems to you interesting to implement at a low level, ask me and I
will write it for you. If you write your own filters, please send your
code to me, and I will very certainly add it to the official release.
Some interesting things could be done: keep/discard sequences beginning
with 0 or 1; only keep sequences of increasing values; map the sequence
to its derivate; a mask system could also be provided in order to
keep/remove numbers at given positions; etc.
(for-each-symbolic-filter-reparse symbol)
This is probably only useful if you hack your own low-level fil-
ters in the source code.
(eis-exit)
Exit the program.
HACKER'S CORNER
The code is intended to be very easy to hack. A good compromise between
full micro-optimization and easy to use prototypes has been found.
Please, don't mind hacking the source code according to your needs. If
you think functions you wrote are of general interest, let me know them
and I will probably add them to the main file or to a separate file.
However only one thing is documented here: how to write symbolic fil-
ters.
A symbolic filter rests on a very simple C function with prototype:
char *myfilter(char *buf)
where buf points to the string version of the sequence (without the ID)
in the following format:
'( a b c d ... )
which is a valid Scheme expression to be evaluated. The filter can scan
the string and eventually change its content as long as it remains a
valid Scheme expression. The sequence is discarded if the function
returns a NULL pointer.
If the changes are not too big, you can do it in the buf buffer; for
instance removing some numbers by replacing them with spaces:
'( a b c d e ) --> '( a c e )
If the changes are big, you may need another buffer. Since you are not
allowed to allocate memory, please use the global variable:
char global_buffer[4*BUFSIZ];
(buf being smaller: size is BUFSIZ). You can then return address
of global_buffer instead of buf.
One restriction has to be kept in mind: you are not allowed to change
the three initial characters of buf, which are "'( ". If you need some-
thing else, like "(list " or "`( ", please use global_buffer.
Once the function is defined, just add a declaration at the very end of
the source code.
add_hentry("is-binary?",&is_binary);
(the 'h' is not a misprint; it stands for 'hash table').
TO BE REMOVED: HACKING
[This section will be removed; it is an old one, and after having
implemented the symbolic filters system, I decided to remove this sec-
tion in the next releases, since the symbolic filters are much more
easy to hack. Besides, I like rather providing few new Scheme func-
tions].
The code is intended to be very easy to hack. A good compromise between
full micro-optimization and easy to use prototypes has been found.
Please, don't mind hacking the source code according to your needs. If
you think functions you wrote are of general interest, let me know them
and I will probably add them to the main file or to a separate file.
Remember eisseeker rests on a general prototype which is
typedef void (*compute_seq_t)(void *,char *,char *);
which means that you can write very quickly a new interface by adding
two C functions to the source code; the second one calls a function
called parser() like this:
parser(myfunc,myarg);
with an argument being the first function and another argument being of
type (void *) referencing to whatever may be wanted. The parser() func-
tion will then read the whole database file, and call func(arg,id,buf);
after having read each sequence in the database. First argument is your
own (void *) pointer; second is a string with seven characters (with no
'\0' at the end) containing the ID of the sequence; third argument is a
string (ending with '\0') containing the sequence in the following for-
mat:
'( a b c d ... n )
which means that there is exactly one space before and after each num-
ber in the sequence. Of course, it is now very easy to convert buf into
a scheme list object.
The parser() function always returns an #unspecified Scheme value.
Don't forget to interface your C function to the Scheme shell at the
end of the file, in the inner_main() function.
TO BE REMOVED: HACKING EXAMPLES
[This section will be removed; it is an old one, and after having
implemented the symbolic filters system, I decided to remove this sec-
tion in the next releases, since the symbolic filters are much more
easy to hack. Besides, I like rather providing few new Scheme func-
tions].
The most basic way of using the parser function can be found by looking
at the (eis-for-each ...) code (see explanations about his function in
the present manual page):
void for_each_(void *lambda,char *id,char *buf) {
scm_call_1(*(SCM *) lambda,
scm_cons(scm_mem2symbol(id,7),
scm_c_eval_string(buf)));
}
SCM for_each(SCM lambda) {
return(parser(for_each_,(void *) &lambda));
}
It is easy to understand that argument of the previous function is a
single Scheme object.
Here is another example with two Scheme objects as arguments of the
high-level Scheme function (eis-for-each-map ...):
void for_each_map_(void *lambda,char *id,char *buf) {
scm_call_1(*(SCM *) (lambda+sizeof(SCM)),
scm_cons(scm_mem2symbol(id,7),
scm_map(*(SCM *) lambda,
scm_c_eval_string(buf),SCM_EOL)));
}
SCM for_each_map(SCM lambda1,SCM lambda2) {
SCM lambda[2] = {lambda1, lambda2};
return(parser(for_each_map_,(void *) &lambda));
}
Of course, the (void *) can reference to anything in memory (a Scheme
object, an array of Scheme objects, a compiled regular expression, an
array of compiled regular expressions, etc.).
TODO
* add "reparse" options (for instance: disable searching and output
when the filtered sequence gets too short to provide significant
results);
* add new functions of the eis-for-each-*-reparse kind, with the Posix
regexp searching operations;
* add new functions of the eis-for-each-*-reparse kind, with some user-
defined Scheme mathematical matching tests (for instance: a matches b
if a divides b, or a matches b if difference is at most 1, etc.);
* add new parsers (maybe for bz2 compressed files);
* maybe add a curses interface.
AUTHOR
Written by Thomas Baruchel.
MAILING LIST
You can subscribe to the mailing list of the EIS Seeker Project by
sending a message with 'subscribe' in the Subject field to eisseeker-
request at freelists.org. You can also subscribe from the webpage:
http://www.freelists.org/list/eisseeker.
Purposes of the mailing list are
* discuss about further evolutions of the program; if you have special
needs, you may either make your suggestion there or by sending a mes-
sage to me;
* discuss about general algorithms that could lead to original ways of
searching relations in the database, and see which could be the best
implementation for them;
* discuss about filters, either at a C level or at a Scheme level;
* discuss about the significance of results (since the "reparsing" sys-
tem may find more than 10-15 relations, among which very few are
really significant).
KNOWN BUGS
Leaving a loop with the help of call-with-current-continuation is not
recommanded, since file being read won't be closed. But if you want to
do it only a few times in a session, it should work (not tried).
I didn't implement many tests in order to prevent the user to give bad
type arguments when calling the new functions. I am not interested in
that, because I write this tool for people of good will who are going
to use it as it is intended to be used. If anyway you think funny to
look for cases where the program gets killed, you will surely find some
(then you can always tell me, I will perhaps add a test), but generally
the libguile library is strong enough to detect them and avoid danger-
ous operations.
REPORTING BUGS
Report bugs to <thomas.baruchel at laposte.net>.
COPYRIGHT
This is free software. There is NO warranty.
SEE ALSO
info guile.info
info guile-tut.info
Encyclopedia of Integer Sequences Seeker EISSEEKER(1)
;;;
;;; script file
;;;
Assuming offset of sequences is 1, filter sequences by
keeping numbers whose position is a Fibonacci number,
then reparse database and looks for matches.
(In order to avoid trivial matches of trivial
sequences with houndreds other trivial sequences,
the filter discards transfomed sequences
for which sum of terms is less than 15000).
[When I tried that filter, there was no length
indication yet]
Filtered A069110 matches A007570
Filtered A023048 matches A083701
Filtered A071071 matches A006263
Filtered A071008 matches A051285
Filtered A071008 matches A014221
Filtered A044432 matches A005203
Filtered A030164 matches A058891
Filtered A000204 matches A068098
Filtered A019434 matches A004249
Filtered A019434 matches A007516
Filtered A053965 matches A053943
Filtered A053965 matches A053955
Filtered A070271 matches A008973
Filtered A003556 matches A020955
Filtered A066289 matches A007539
Filtered A014417 matches A056830
Filtered A006937 matches A008559
Filtered A000042 matches A037842
Filtered A017672 matches A056571
Filtered A000583 matches A056571
Filtered A014960 matches A009967
Filtered A017674 matches A056572
Filtered A017676 matches A056573
Filtered A001014 matches A056573
Filtered A017678 matches A056574
Filtered A001015 matches A056574
Filtered A017680 matches A056585
Filtered A001016 matches A056585
Filtered A047805 matches A008695
Filtered A047804 matches A008691
Filtered A017682 matches A056586
Filtered A001017 matches A056586
Filtered A017684 matches A056587
Filtered A008454 matches A056587
Filtered A085337 matches A085338
Filtered A037147 matches A043303
Filtered A050259 matches A036236
Try same filter by assuming offset is 0, and
discards filtered sequences for which the sum is
less than 5000.
Filtered A000032 matches A068098 (length of filtered sequence is 8)
...
<Ctrl-C>
More information about the SeqFan
mailing list