My tool for studying the stripped.gz database

Tue Sep 30 21:38:23 CEST 2003

Hi,

the 17th of september 2003, I wrote a message about an idea
of libguile-based tool for studying the stripped.gz version
of Sloane's database. I have not much time :-( but here is
what I already wrote. It is a scheme interpreter using
libguile, which provides Scheme functions for finding
relations between sequences of the database.

You can download it at
http://baruchel.thomas.free.fr/eisseeker.tar.gz
(initial release 0.1)

The first experiments I tried give good results,
parsing is speed enough for my 1.7Mhz; a gz compressed
files parser allow to use a small 6Mb memory disk
(but since I have enough RAM, I use a 24Mb memory
disk and work on an uncompressed version of the database).

I will not make noise in the seqfan mailing list with
it; I created a mailing list to which you can subscribe if
you are interested. I hope this long message will not
disturb people, but I will not send it again. Thank
you for being comprehensive. You can find informations
about the mailing list below or with the package.

Here is the current manual page, and very below is a script
file with relations detected. I didn't even check for
the interest of these relations, since my current work
only concerns writing the source code, but I send it
anyway. Probably 3/4 of the relations is not really
significant, but I would be proud if 2 or 3 relations
is interesting enough to be mentionned as a comment in
the on-line database.

Cordially,
                Thomas baruchel

EISSEEKER(1)                                                      EISSEEKER(1)

NAME
       eisseeker - guile-based shell for accessing the Encyclopedia of Integer
       Sequences and looking for relations between sequences

SYNOPSIS
       eisseeker [OPTION] ...

DESCRIPTION
       The guile-based shell provides high-level functions in order to  access
       the  Encyclopedia of Integer Sequences on a local file system and look-
       ing for relations between sequences. The database  is  available  as  a
       compressed                          archive                          at
       http://www.research.att.com/~njas/sequences/Seis.html#FULL (name of the
       file  is  stripped.gz).  It  is a good idea to copy it on a memory file
       system before launching the program. The copy can either be  compressed
       (.gz  format) or uncompressed, since eisseeker can use the zlib library
       to access it if needed.

OPTIONS
       Options are exactly the same as for any guile shell. Try

              eisseeker -h or eisseeker --help

       to get the list of options.

       Much information is available about guile through the info (1) program.

INSTALLATION
       You  need  to  have liguile installed on your system. You may enable or
       disable several options at compilation time. Edit the source code,  and
       remove or add the following lines at the very beginning of the file:

       WITH_ZLIB  Use  zlib to read a .gz compressed database. If this options
              is used, don't forget to add -lz at compilation time.

       WITH_DEBUG Main purpose is to display how much time  was  needed  by  a
              loop.  It  will  be  very  confusing  to  enable  this option at
              research time, since several loops will be very probably nested,
              but  it may be useful to have this information for some specifi-
              cal situations.

       Then compile the program by linking it with libguile and with the other
       libraries  that  have  been  declared at the beginning of the file. For
       instance:

       gcc -I/usr/local/include/ -L/usr/local/lib/ -lz -lguile -Wall -O3 -ansi
       -o eisseeker eisseeker.c

NEW SCHEME FUNCTIONS
       The  shell provides several new Scheme functions in order to access the
       database:

       eis-seeker-version
              This is a symbol defined by the program. The symbol is bound  to
              the  current  version of EISSeeker. It is intended to allow cus-
              tomization of your ~/.guile file without disturbing other  guile
              shells. You can put in your configuration file:
                (use-modules (ice-9 readline)) ; use readline library
                (activate-readline)            ; activate readline
                (use-modules (ice-9 slib))     ; use SLIB standard library
                (if (defined? 'eis-seeker-version)      ; this only concern eisseeker
                  (begin ...))
              or (alternate version):
                (use-modules (ice-9 readline)) ; use readline library
                (activate-readline)            ; activate readline
                (use-modules (ice-9 slib))     ; use SLIB standard library
                (if (false-if-exception (eval 'eis-seeker-version (current-module)))
                  (begin ...))
              You  may  use this feature for setting the location of the data-
              base on your system, to choose which parser is needed, etc.

       (eis-mount-database name)
              Set the new database to be read to file called name  (which  has
              to be a string).

       (eis-umount-database)
              Unmount the previously mounted database.

       (eis-set-parser parser)
              Set  the current parser to parser, which has to be a symbol like
              'basic or 'gz. Note that  'gz  can  read  either  compressed  or
              uncompressed files.

       (eis-for-each proc)
              Successively  apply  proc  to each sequence of the database. The
              sequences are list objects with an initial item being  a  symbol
              giving  the  ID of the sequence. Thus the sequence of numbers is
              the cdr of the argument received by proc. For instance: (A070214
              1 2 5 8 11). All functions providing such loops may be nested.

       (eis-for-each-map proc1 proc2)
              Low-level implementation of:
                (eis-for-each
                  (lambda (l)
                    (proc2 (cons (car l) (map proc1 (cdr l))))))

       (eis-for-each-test proc1 proc2)
              Low-level implementation of:
                (eis-for-each
                  (lambda (l)
                    (if (proc1 (cdr l)) (proc2 l))))

       (eis-for-each-test-map proc1 proc2 proc3)
              Low-level implementation of:
                (eis-for-each
                  (lambda (l)
                    (if (proc1 (cdr l))
                      (proc3 (cons (car l) (map proc2 (cdr l))))))

       (eis-for-each-filter proc1 proc2)
              Low-level implementation of:
                (eis-for-each
                  (lambda (l)
                    (let ((n (proc1 (cdr l))))
                      (if n (if (not (zero? (length n)))
                        (proc2 (cons (car l) n)))))))
              To be more precise, proc1 is a filter, which means a Scheme pro-
              cedure returning either the false boolean value or any transfor-
              mation  of  the  sequence. If the filter applied to the sequence
              returns false or an empty list, any further  process  concerning
              that  sequence  is  skipped. If the filter returns a valid list,
              then proc2 is applied to the transformed list.

       (eis-for-each-filter-reparse filter)
              Each sequence in the database is read, and filter is applied  to
              its  cdr  (in  order  to remove the ID symbol). If not discarded
              (see (eis-for-each-filter)), the filtered sequence  is  searched
              in  the  database. A verbose output tells the result of the com-
              parison. This function allows to systematically looks for trans-
              formed  sequences  in  the  database.  If a transformed sequence
              match its initial version, it is skipped in order to  avoid  too
              much  noise in the output. Length of the transformed sequence is
              displayed in order to help removing unsignificant results.

       (eis-for-each-symbolic-filter symbol proc)
              Low-level implementation  of  (eis-for-each-filter ...).  It  is
              much  quicker,  but  the  filter  has to be hacked in the source
              code. This feature is provided in  order  to  discard  sequences
              that  can very easely be scanned in their initial string version
              (for instance when considering binary sequences, base  notation,
              length, etc.). It is much quicker than the other version. If you
              want to hack the file, see Hacker'S CORNER. If you  don't  want,
              you  can  see  in  the list of predefined symbols if one matches
              your needs:

               'is-binary?  Only keep binary sequences (which is not the  same
               that  base  2  sequences). All numbers of the sequences must be
               either 0 or 1.

               'is-no-binary?  Obvious.

               'is-sign?  Only keep sequences with at least one negative  num-
               ber.

               'is-nonn?  Only keep sequences with no negative number.

               'is-base2?  Obvious.

               'is-no-base2?  Obvious.

               'is-base3?  Obvious.

               'is-no-base3?  Obvious.

               'is-base4?  Obvious.

               'is-no-base4?  Obvious.

               'is-base5?  Obvious.

               'is-no-base5?  Obvious.

               'is-base6?  Obvious.

               'is-no-base6?  Obvious.

               'is-base7?  Obvious.

               'is-no-base7?  Obvious.

               'is-base8?  Obvious.

               'is-no-base8?  Obvious.

               'is-base9?  Obvious.

               'is-no-base9?  Obvious.

               'is-even?  All numbers must be even.

               'is-odd?  All numbers must be odd.

               'is-alternate-odd-even?  Obvious.

               'has-single?  Length of the sequence is 1.

               'remove-even Transform the sequence by removing even numbers.

       Why  no  more symbolic filters? In fact I would like my program to pro-
       vide houndreds of such filters, since they really are much quicker, but
       I am not really interested in writing them, though they are really very
       easy to implement. I got bored after having written the last one, and I
       will  perhaps  add  some  more later. But if you need a specific filter
       that seems to you interesting to implement at a low level, ask me and I
       will  write it for you. If you write your own filters, please send your
       code to me, and I will very certainly add it to the  official  release.
       Some interesting things could be done: keep/discard sequences beginning
       with 0 or 1; only keep sequences of increasing values; map the sequence
       to  its  derivate;  a  mask  system  could also be provided in order to
       keep/remove numbers at given positions; etc.

       (for-each-symbolic-filter-reparse symbol)
              This is probably only useful if you hack your own low-level fil-
              ters in the source code.

       (eis-exit)
              Exit the program.

HACKER'S CORNER
       The code is intended to be very easy to hack. A good compromise between
       full micro-optimization and easy to  use  prototypes  has  been  found.
       Please,  don't mind hacking the source code according to your needs. If
       you think functions you wrote are of general interest, let me know them
       and I will probably add them to the main file or to a separate file.

       However  only  one thing is documented here: how to write symbolic fil-
       ters.

       A symbolic filter rests on a very simple C function with prototype:

       char *myfilter(char *buf)

       where buf points to the string version of the sequence (without the ID)
       in the following format:

       '( a b c d ... )

       which is a valid Scheme expression to be evaluated. The filter can scan
       the string and eventually change its content as long as  it  remains  a
       valid  Scheme  expression.  The  sequence  is discarded if the function
       returns a NULL pointer.

       If the changes are not too big, you can do it in the  buf  buffer;  for
       instance removing some numbers by replacing them with spaces:

       '( a b c d e ) --> '( a   c   e )

       If  the changes are big, you may need another buffer. Since you are not
       allowed to allocate memory, please use the global variable:

       char global_buffer[4*BUFSIZ];
              (buf being smaller: size is BUFSIZ). You can then return address
              of global_buffer instead of buf.

       One  restriction  has to be kept in mind: you are not allowed to change
       the three initial characters of buf, which are "'( ". If you need some-
       thing else, like "(list " or "`( ", please use global_buffer.

       Once the function is defined, just add a declaration at the very end of
       the source code.

       add_hentry("is-binary?",&is_binary);

       (the 'h' is not a misprint; it stands for 'hash table').

TO BE REMOVED: HACKING
       [This section will be removed; it is  an  old  one,  and  after  having
       implemented  the symbolic filters system, I decided to remove this sec-
       tion in the next releases, since the symbolic  filters  are  much  more
       easy  to  hack.  Besides,  I like rather providing few new Scheme func-
       tions].

       The code is intended to be very easy to hack. A good compromise between
       full  micro-optimization  and  easy  to  use prototypes has been found.
       Please, don't mind hacking the source code according to your needs.  If
       you think functions you wrote are of general interest, let me know them
       and I will probably add them to the main file or to a separate file.

       Remember eisseeker rests on a general prototype which is

       typedef void (*compute_seq_t)(void *,char *,char *);

       which means that you can write very quickly a new interface  by  adding
       two  C  functions  to  the source code; the second one calls a function
       called parser() like this:

       parser(myfunc,myarg);

       with an argument being the first function and another argument being of
       type (void *) referencing to whatever may be wanted. The parser() func-
       tion will then read the whole database file, and call func(arg,id,buf);
       after having read each sequence in the database. First argument is your
       own (void *) pointer; second is a string with seven characters (with no
       '\0' at the end) containing the ID of the sequence; third argument is a
       string (ending with '\0') containing the sequence in the following for-
       mat:

       '( a b c d ... n )

       which  means that there is exactly one space before and after each num-
       ber in the sequence. Of course, it is now very easy to convert buf into
       a scheme list object.

       The parser() function always returns an #unspecified Scheme value.

       Don't  forget  to  interface your C function to the Scheme shell at the
       end of the file, in the inner_main() function.

TO BE REMOVED: HACKING EXAMPLES
       [This section will be removed; it is  an  old  one,  and  after  having
       implemented  the symbolic filters system, I decided to remove this sec-
       tion in the next releases, since the symbolic  filters  are  much  more
       easy  to  hack.  Besides,  I like rather providing few new Scheme func-
       tions].

       The most basic way of using the parser function can be found by looking
       at  the (eis-for-each ...) code (see explanations about his function in
       the present manual page):

       void for_each_(void *lambda,char *id,char *buf) {
         scm_call_1(*(SCM *) lambda,
           scm_cons(scm_mem2symbol(id,7),
                    scm_c_eval_string(buf)));
       }
       SCM for_each(SCM lambda) {
         return(parser(for_each_,(void *) &lambda));
       }

       It is easy to understand that argument of the previous  function  is  a
       single Scheme object.

       Here  is  another  example  with two Scheme objects as arguments of the
       high-level Scheme function (eis-for-each-map ...):

       void for_each_map_(void *lambda,char *id,char *buf) {
         scm_call_1(*(SCM *) (lambda+sizeof(SCM)),
           scm_cons(scm_mem2symbol(id,7),
                    scm_map(*(SCM *) lambda,
                    scm_c_eval_string(buf),SCM_EOL)));
       }
       SCM for_each_map(SCM lambda1,SCM lambda2) {
         SCM lambda[2] = {lambda1, lambda2};
         return(parser(for_each_map_,(void *) &lambda));
       }

       Of course, the (void *) can reference to anything in memory  (a  Scheme
       object,  an  array of Scheme objects, a compiled regular expression, an
       array of compiled regular expressions, etc.).

TODO
       * add "reparse" options (for instance:  disable  searching  and  output
         when  the  filtered  sequence  gets  too short to provide significant
         results);

       * add new functions of the eis-for-each-*-reparse kind, with the  Posix
         regexp searching operations;

       * add new functions of the eis-for-each-*-reparse kind, with some user-
         defined Scheme mathematical matching tests (for instance: a matches b
         if a divides b, or a matches b if difference is at most 1, etc.);

       * add new parsers (maybe for bz2 compressed files);

       * maybe add a curses interface.

AUTHOR
       Written by Thomas Baruchel.

MAILING LIST
       You  can  subscribe  to  the  mailing list of the EIS Seeker Project by
       sending a message with 'subscribe' in the Subject field  to  eisseeker-
       request at freelists.org.   You  can  also  subscribe  from  the  webpage:
       http://www.freelists.org/list/eisseeker.

       Purposes of the mailing list are

       * discuss about further evolutions of the program; if you have  special
         needs, you may either make your suggestion there or by sending a mes-
         sage to me;

       * discuss about general algorithms that could lead to original ways  of
         searching  relations in the database, and see which could be the best
         implementation for them;

       * discuss about filters, either at a C level or at a Scheme level;

       * discuss about the significance of results (since the "reparsing" sys-
         tem  may  find  more  than  10-15 relations, among which very few are
         really significant).

KNOWN BUGS
       Leaving a loop with the help of call-with-current-continuation  is  not
       recommanded,  since file being read won't be closed. But if you want to
       do it only a few times in a session, it should work (not tried).

       I didn't implement many tests in order to prevent the user to give  bad
       type  arguments  when calling the new functions. I am not interested in
       that, because I write this tool for people of good will who  are  going
       to  use  it  as it is intended to be used. If anyway you think funny to
       look for cases where the program gets killed, you will surely find some
       (then you can always tell me, I will perhaps add a test), but generally
       the libguile library is strong enough to detect them and avoid  danger-
       ous operations.

REPORTING BUGS
       Report bugs to <thomas.baruchel at laposte.net>.

COPYRIGHT
       This is free software. There is NO warranty.

SEE ALSO
       info guile.info
       info guile-tut.info

                   Encyclopedia of Integer Sequences Seeker       EISSEEKER(1)

;;;
;;; script file
;;;
Assuming offset of sequences is 1, filter sequences by
keeping numbers whose position is a Fibonacci number,
then reparse database and looks for matches.
(In order to avoid trivial matches of trivial
sequences with houndreds other trivial sequences,
the filter discards transfomed sequences
for which sum of terms is less than 15000).
[When I tried that filter, there was no length
indication yet]

Filtered A069110 matches A007570
Filtered A023048 matches A083701
Filtered A071071 matches A006263
Filtered A071008 matches A051285
Filtered A071008 matches A014221
Filtered A044432 matches A005203
Filtered A030164 matches A058891
Filtered A000204 matches A068098
Filtered A019434 matches A004249
Filtered A019434 matches A007516
Filtered A053965 matches A053943
Filtered A053965 matches A053955
Filtered A070271 matches A008973
Filtered A003556 matches A020955
Filtered A066289 matches A007539
Filtered A014417 matches A056830
Filtered A006937 matches A008559
Filtered A000042 matches A037842
Filtered A017672 matches A056571
Filtered A000583 matches A056571
Filtered A014960 matches A009967
Filtered A017674 matches A056572
Filtered A017676 matches A056573
Filtered A001014 matches A056573
Filtered A017678 matches A056574
Filtered A001015 matches A056574
Filtered A017680 matches A056585
Filtered A001016 matches A056585
Filtered A047805 matches A008695
Filtered A047804 matches A008691
Filtered A017682 matches A056586
Filtered A001017 matches A056586
Filtered A017684 matches A056587
Filtered A008454 matches A056587
Filtered A085337 matches A085338
Filtered A037147 matches A043303
Filtered A050259 matches A036236

Try same filter by assuming offset is 0, and
discards filtered sequences for which the sum is
less than 5000.
Filtered A000032 matches A068098 (length of filtered sequence is 8)
...
<Ctrl-C>