Various items, Dec 20 2006

Wed Dec 20 14:41:52 CET 2006

On 12/20/06, N. J. A. Sloane <njas at research.att.com> wrote:
> 4.  An interesting recent entry:
> %I A124104
> %S A124104 0,2,36,600,11100,235560,5746524
> %N A124104 Sum of the Rand distance between all pairs of set partitions of {1, 2, ... n}.
> %D A124104 W. Rand, Objective criteria for the evaluation of clustering methods. J. Amer. Stat. Assoc., 66 (336): 846-850, 1971.
> %e A124104 E.g. a(2) = 2 = 1 + 1 + 0 + 0 because the distance from 1,2 to 12 is 1 (and vice versa) and the distance from 1,2 to 1,2 or 12 to 12 is 0.
> %K A124104 hard,nonn,new
> %O A124104 1,2
> %A A124104 Andrey Goder (andy.goder(AT)gmail.com), Dec 12 2006
>
> - what is the Rand distance?

At http://www.cs.ucdavis.edu/~filkov/papers/filkov-ijait04.pdf they
explain it pretty well.

Given a partition of {1, ..., n}, look at a pair of elements.  If the
two elements are in the same block of the partition, they're called
"co-clustered".  The Rand distance between two partitions then counts
the number of pairs that are co-clustered in exactly one of the two
partitions.

The Rand index is normalized by dividing the Rand distance by (n choose 2).

Example: The distance from 12 3 4 to 1 234 is 4 because of the four
pairs 12 (in the first partition but not the second), and 23, 24, 34
(in the second partition but not the first).  The maximal distance of
6 is attained by 1 2 3 4 and 1234.

Apparently it originated in a 1971 paper by Rand in J Amer. Stat.
Assoc., 66(336):846-850.

It has some nice properties for a distance, like the triangle
inequality, and there are linear-time algorithms for computing it.

Seems like instead of just this entry with sum, we should have a table
with the number of pairs of partitions having each distance, and maybe
a few more seqs as well ...

--Joshua Zucker