[seqfan] Re: Extended search in OEIS data with GCD reduction

Neil Sloane njasloane at gmail.com
Tue Oct 13 19:50:09 CEST 2020


Hugo, I think that is a great idea.  I am all in favor of it, and
especially of incorporating it into Superseeker.

Best regards
Neil

Neil J. A. Sloane, President, OEIS Foundation.
11 South Adelaide Avenue, Highland Park, NJ 08904, USA.
Also Visiting Scientist, Math. Dept., Rutgers University, Piscataway, NJ.
Phone: 732 828 6098; home page: http://NeilSloane.com
Email: njasloane at gmail.com



On Tue, Oct 13, 2020 at 12:31 PM Hugo Pfoertner <yae9911 at gmail.com> wrote:

> Dear SeqFans,
>
> I would like to present a more general contribution here.
>
> For a long time I have been thinking about how I can fix a fundamental
> weakness in the search function and even in Superseeker for my own use. It
> should be known that sequences that are only stored with a common factor
> are not found when searching for a sequence in which this factor has been
> divided out, or that uses a common factor other than that of the stored
> data.
>
> If we assume a hypothetical database entry 30, 70, 80, 100, 140, 160, 240,
> then a search for 3,7,8,10,14,16,24 performed with either the normal OEIS
> search function or with the Superseeker will not lead to a hit, although it
> might be useful or at least interesting to get a reference to 30, 70, 80,
> 100, 140, ... in this case.
>
> A practical example happened recently when Don Knuth submitted a new
> sequence A337274 <https://oeis.org/A337274>. Only a little later did he
> find out that there was already an existing sequence A033472
> <https://oeis.org/A033472> on this topic, which, with the exception of the
> initial terms, was exactly Don's sequence times 2.
>
> So I thought it might be useful if a search could access data in which
> common factors were divided out. Because in very many cases there are small
> initial terms that make it impossible to extract factors, I decided to
> start such a search for factors only with sufficiently large terms, or to
> run the search backwards from the end of the sequence. In many cases, it
> turns out that even smaller terms have the same factor that is found when
> the search e.g. is only carried out for numbers> = 100. The method is
> hardly applicable to "tabl" sequences, but structured search in tables is
> another huge topic to be adressed.
>
> I tried such a search for common factors with a range 100 <= x <= 2^63,
> over the entire data fields of all sequences, and when I found a GCD > 1,
> then in a second run the contiguous range was determined in which all
> successive numbers were divisible by the factor. If at least 4 consecutive
> numbers were found, then the result was output to a file. This file with
> more than 50,000 lines (one line per A number) is welcome to look at. It is
> saved under
>
> http://www.randomwalk.de/sequences/gcd0.txt
>
> The format of these files is one line per A-number, starting with the
> A-number without the "A", then the GCD, the number of divided items and
> then the list of data, which is potentially shorter than the original DATA,
> due to excluded non-divisible initial terms and very large terms in the
> tail of the DATA.
>
> To my support, Georg Fischer carried out a similar search on the basis of
> all b-files and, with a slightly different method, and came up with more
> than 45,000 hits from sequences with GCD > 1.
>
> Since the situation occurs quite often that sequences have the form
> "composite number + 1" or "composite number - 1", I applied the same action
> to a total data set that was incremented or decremented by 1.
> This means that the prime number sequence A000040 in both variants (p +
> 1)/2 and (p-1)/2 becomes part of the result lists. In both cases, i.e., the
> entire database shifted by plus 1 or minus 1, approx. 50,000 data records
> are found, from which a common factor can be divided out. Because I
> excluded sequences in which more than one negative number occurs, the
> number of sequences that can be reduced by a common factor is probably even
> slightly larger.
>
> At the moment, the 3 files are only updated sporadically and not regularly.
> The "shifted" results are given in
> http://www.randomwalk.de/sequences/gcdminus1.txt
> http://www.randomwalk.de/sequences/gcdplus1.txt
>
> As long as I only use it myself, I can generate a current version from
> "stripped" at any time. The extraction program for generating the reduced
> sequences is a Fortran program that is still in a sketchy state, and there
> is still a lot that can be cleaned up towards the production version, but
> it works.
> In addition to the three files named gcd * .txt, it currently generates a
> number of other outputs that have helped me understand the program
> function, among them an information file
> http://www.randomwalk.de/sequences/stripped_stats.txt
> which contains the number of commas (# of data +1) and the number of minus
> signs per data line and thus gives a quick overview of how many searchable
> terms (excluding b-files) there are for each database entry and whether it
> is a "sign" entry.
>
> You can search in the result files with a text editor or with grep.
>
> But my real goal was to use this in an advanced search by program. If you
> are looking for a database with GCD that has been divided out, then you
> should of course also eliminate common factors from the number series of
> the search input.
>
> Since the programming language in which I can create a working program with
> access to files and some calculation with the least effort and safely is
> still the good old Fortran, I implemented the search function and the
> conversion program in this language.
> The search function is based on the files mentioned. The source code of the
> programs is available in a zip file
> http://www.randomwalk.de/sequences/oeisearch.zip
>
> The program is commented so far that it should be easy to implement it
> yourself in any other programming language.
>
> The GCD calculation is performed with the subroutine gcdn, which implements
> the ACM algorithm 386 by T. Bradley from 1970.
> https://dl.acm.org/doi/10.1145/362686.362697
> A version that has been trivially adapted to 8-byte integer is at the end
> of my Fortran file. The original can be obtained from
> http://netlib.org/toms/386.gz
>
> On Linux systems this can be compiled with the Gnu-Fortran compiler
> gfortran.
>
> For Windows users I provide a command line executable oeisearch (see link
> to readme below how to access and install).
>
> For those who are interested in trying this out, the necessary information
> for installation and examples of its use are given in
>
> http://www.randomwalk.de/sequences/oeisearch_readme.txt
>
> I can assure you that it's fun, and I've already made some amazing
> discoveries with recently submitted new sequences, often just converting
> the input variants (+ -1, GCD) to hits in the original OEIS data before the
> GCD-reduced data is accessed at all.
>
> If I can dream a little, then a combination of Superseeker with the
> GCD-reduced data would be a significant extension of the hit area.
> Superseeker users already know the references to near hits, and then an
> extended reference, like
> "transformation Txx produces a match in (Axxxxxx (n) -1) / 24" would also
> be interpretable.
>
> I look forward to any kind of feedback and hope for a lively discussion.
>
> Best regards
>
> Hugo Pfoertner
>
> --
> Seqfan Mailing list - http://list.seqfan.eu/
>



More information about the SeqFan mailing list