[seqfan] Pathological values of n for the backtracking process in the computation of the n-queens puzzle

Sat Sep 29 19:30:09 CEST 2018

Dear fellows,

the following lines are not really related to Peter Luschny's message,
but as I told him separately, I was very surprised to read his own question,
because I was myself thinking about the backtracking process in the usual
recursive implementation of the n-queens solver.

I recently could make a comparison between two different implementation of the
same backtracking process; while both versions run over the exact same tree
of possible locations, some benchmarks make me think that one of them has
a better complexity than the other (which is different than what I wrote
to Peter Luschy without having checked before).

The following lines are related to implementing the algorithm, by mathematical
questions will arise after that.

Here are both idea:

   * the "usual" implementation pushes on a stack the queens above the next
     row to be filled; then for each of the n cells of the row, it checks whether
     the given cell is attacked by one of the queens above ;
   * my other version uses a n * (3n-2) grid; the "true" n*n grid is located at
     the middle (horizontally) of this larger grid; at the beginning of the recursion,
     all cells contain 0; the values give the number of attacking queens; then each
     time a new queen if found, the values of the attacked cells below are incremented
     by 1 and the recursion goes deeper; of course, before leaving the recursive call,
     the cells are decremented to their previous value (using a n * (3n-2) grid allows
     to increment/decrement very quickly without having to care about the borders).

(I will copy the pieces of code for Python below in case someone wants to try them.)

At first glance, the second implementation will be much better than the first one
when coming to the last rows of the grid. One could think that with greater values
of n, the backtracking process will more often go near of the bottom of the grid
(because it may be easier to find possible locations for new queens); 
thus when n is growing, the second implementation will become even better.

This is true, but empirical benchmarks show "pathological" values of n: while the
time ratio seems to get increasing with some regularity, some values of n make
the second implementation suddenly "much better" than expected; the following
value of n then makes the ratio go down to a more expected value.

For instance, n=6 and n=9 seem to be "pathological" values of n for the backtracking
process (meaning that the backtracking process is probably spending much more time
at the bottom of the grid than other values of n with the same order of magnitude).

Here are some results:

4 (2, 2) --> 1.2208835341365463
5 (10, 10) --> 1.5460399227301995
6 (4, 4) --> 3.418912573950606
7 (40, 40) --> 1.3577441077441077
8 (92, 92) --> 2.3758132726089785
9 (352, 352) --> 8.723393820500245
10 (724, 724) --> 4.212940734098863
11 (2680, 2680) --> 5.581994798201807
12 (14200, 14200) --> 6.181260906602268
13 (73712, 73712) --> 6.702137837219632

The first column is the value of n; then the number of solution computed by both
versions is given; finally you can see the ratio of time between both implementations.
As you can see, n=6 and n=9 are "pathological".

Do you think a more rigorous criterium could be given for that "pathological" values?

I apologize for giving my Python code below, but maybe someone would be interested
in trying it (I strongly suggest using the Pypy interpreter rather than the
usual CPython one, because it will be _much_ quicker for this computation).

The second implementation (above) is actually the first (below); it is rather
optimized because I spent time on it; the first usual one (above) is the second (below)
and it is not really optimized.

Best regards,

# -*- coding: utf-8 -*-

# Implementation 1

def placement(i, n, mask):
     r = 0
     for k in range(n):
         if not mask[i][k+n-1]:
             if i == n-1:
                 return 1
             for j in range(i+1, n):
                 mask[j][k+n-1] += 1
                 mask[j][k-j+i+n-1] += 1
                 mask[j][k+j-i+n-1] += 1
             r += placement(i+1, n, mask)
             for j in range(i+1, n):
                 mask[j][k+n-1] -= 1
                 mask[j][k-j+i+n-1] -= 1
                 mask[j][k+j-i+n-1] -= 1
     return r

def resolution(n):
     if n==0: return 1
     M = [ [0]*(3*n-2) for _ in range(n) ]
     return placement(0, n, M)

# Implementation 2

def est_libre(i, j, L):
     test = True
     for a in range(i):
         b = L[a]
         if b == j or a-b == i-j or a+b == i+j:
             test = False
     return test

def placement2(i, L):
     r = 0
     n = len(L)
     if i == n:
         return 1
     else:
         for j in range(n):
             if est_libre(i, j, L):
                 L[i] = j
                 r += placement2(i+1, L)
     return r

def resolution2(n):
     L = [-1]*n
     return placement2(0, L)

import time
n = 3
while True:
     n += 1
     t0 = time.time()
     s1 = resolution(n)
     t1 = time.time()
     s2 = resolution2(n)
     t2 = time.time()
     print(n,(s1,s2),"-->",(t2-t1)/(t1-t0))

-- 
Thomas Baruchel