[seqfan] Re: Extended search in OEIS data with GCD reduction

Hugo Pfoertner yae9911 at gmail.com
Wed Nov 23 16:15:23 CET 2022


Since I occasionally get questions about not always obvious relationships
to existing sequences found using the program described in the October 2020
post, I would like to point out that the program and regularly updated data
are now available as a project on GitHub.com:

https://github.com/HugoPfoertner/OEIS-Search-GCD-reduced

If anyone wants to use this and has questions, please let me know!

On Tue, Oct 13, 2020 at 6:31 PM Hugo Pfoertner <yae9911 at gmail.com> wrote:

> Dear SeqFans,
>
> I would like to present a more general contribution here.
>
> For a long time I have been thinking about how I can fix a fundamental
> weakness in the search function and even in Superseeker for my own use. It
> should be known that sequences that are only stored with a common factor
> are not found when searching for a sequence in which this factor has been
> divided out, or that uses a common factor other than that of the stored
> data.
>
> If we assume a hypothetical database entry 30, 70, 80, 100, 140, 160, 240,
> then a search for 3,7,8,10,14,16,24 performed with either the normal OEIS
> search function or with the Superseeker will not lead to a hit, although it
> might be useful or at least interesting to get a reference to 30, 70, 80,
> 100, 140, ... in this case.
>
> A practical example happened recently when Don Knuth submitted a new
> sequence A337274 <https://oeis.org/A337274>. Only a little later did he
> find out that there was already an existing sequence A033472
> <https://oeis.org/A033472> on this topic, which, with the exception of
> the initial terms, was exactly Don's sequence times 2.
>
> So I thought it might be useful if a search could access data in which
> common factors were divided out. Because in very many cases there are small
> initial terms that make it impossible to extract factors, I decided to
> start such a search for factors only with sufficiently large terms, or to
> run the search backwards from the end of the sequence. In many cases, it
> turns out that even smaller terms have the same factor that is found when
> the search e.g. is only carried out for numbers> = 100. The method is
> hardly applicable to "tabl" sequences, but structured search in tables is
> another huge topic to be adressed.
>
> I tried such a search for common factors with a range 100 <= x <= 2^63,
> over the entire data fields of all sequences, and when I found a GCD > 1,
> then in a second run the contiguous range was determined in which all
> successive numbers were divisible by the factor. If at least 4 consecutive
> numbers were found, then the result was output to a file. This file with
> more than 50,000 lines (one line per A number) is welcome to look at. It is
> saved under
>
> http://www.randomwalk.de/sequences/gcd0.txt
>
> The format of these files is one line per A-number, starting with the
> A-number without the "A", then the GCD, the number of divided items and
> then the list of data, which is potentially shorter than the original DATA,
> due to excluded non-divisible initial terms and very large terms in the
> tail of the DATA.
>
> To my support, Georg Fischer carried out a similar search on the basis of
> all b-files and, with a slightly different method, and came up with more
> than 45,000 hits from sequences with GCD > 1.
>
> Since the situation occurs quite often that sequences have the form
> "composite number + 1" or "composite number - 1", I applied the same action
> to a total data set that was incremented or decremented by 1.
> This means that the prime number sequence A000040 in both variants (p +
> 1)/2 and (p-1)/2 becomes part of the result lists. In both cases, i.e., the
> entire database shifted by plus 1 or minus 1, approx. 50,000 data records
> are found, from which a common factor can be divided out. Because I
> excluded sequences in which more than one negative number occurs, the
> number of sequences that can be reduced by a common factor is probably even
> slightly larger.
>
> At the moment, the 3 files are only updated sporadically and not
> regularly. The "shifted" results are given in
> http://www.randomwalk.de/sequences/gcdminus1.txt
> http://www.randomwalk.de/sequences/gcdplus1.txt
>
> As long as I only use it myself, I can generate a current version from
> "stripped" at any time. The extraction program for generating the reduced
> sequences is a Fortran program that is still in a sketchy state, and there
> is still a lot that can be cleaned up towards the production version, but
> it works.
> In addition to the three files named gcd * .txt, it currently generates a
> number of other outputs that have helped me understand the program
> function, among them an information file
> http://www.randomwalk.de/sequences/stripped_stats.txt
> which contains the number of commas (# of data +1) and the number of minus
> signs per data line and thus gives a quick overview of how many searchable
> terms (excluding b-files) there are for each database entry and whether it
> is a "sign" entry.
>
> You can search in the result files with a text editor or with grep.
>
> But my real goal was to use this in an advanced search by program. If you
> are looking for a database with GCD that has been divided out, then you
> should of course also eliminate common factors from the number series of
> the search input.
>
> Since the programming language in which I can create a working program
> with access to files and some calculation with the least effort and safely
> is still the good old Fortran, I implemented the search function and the
> conversion program in this language.
> The search function is based on the files mentioned. The source code of
> the programs is available in a zip file
> http://www.randomwalk.de/sequences/oeisearch.zip
>
> The program is commented so far that it should be easy to implement it
> yourself in any other programming language.
>
> The GCD calculation is performed with the subroutine gcdn, which
> implements the ACM algorithm 386 by T. Bradley from 1970.
> https://dl.acm.org/doi/10.1145/362686.362697
> A version that has been trivially adapted to 8-byte integer is at the end
> of my Fortran file. The original can be obtained from
> http://netlib.org/toms/386.gz
>
> On Linux systems this can be compiled with the Gnu-Fortran compiler
> gfortran.
>
> For Windows users I provide a command line executable oeisearch (see link
> to readme below how to access and install).
>
> For those who are interested in trying this out, the necessary information
> for installation and examples of its use are given in
>
> http://www.randomwalk.de/sequences/oeisearch_readme.txt
>
> I can assure you that it's fun, and I've already made some amazing
> discoveries with recently submitted new sequences, often just converting
> the input variants (+ -1, GCD) to hits in the original OEIS data before the
> GCD-reduced data is accessed at all.
>
> If I can dream a little, then a combination of Superseeker with the
> GCD-reduced data would be a significant extension of the hit area.
> Superseeker users already know the references to near hits, and then an
> extended reference, like
> "transformation Txx produces a match in (Axxxxxx (n) -1) / 24" would also
> be interpretable.
>
> I look forward to any kind of feedback and hope for a lively discussion.
>
> Best regards
>
> Hugo Pfoertner
>
>



More information about the SeqFan mailing list