making gifs automatically

Marc LeBrun mlb at well.com
Thu Nov 7 03:05:43 CET 2002


 >=Rick Shepherd
 > However, now it occurs to me that a more
 > sophisticated robot of the future (if not of the present)
 > could probably use OCR (Optical Character Recognition)
 > to convert the gif back to text and still harvest the addresses.

There are already very clever programs which make this extremely 
difficult.  They use arbitrary fonts, tilt and mis-register the character 
images, add noise and meaningless graphics (eg light grid lines), use 
varying colors from image to image, etc.

Certainly one could imagine a very sophisticated OCR program handling this 
(after all, humans can!<;-) but it's not clear that it would be 
economically feasible for a spambot web crawler to subject every random gif 
or jpeg it came across to such treatment, on the off chance that it would 
produce a viable eMail address.

I think there was a group at CMU(?) that had algorithms for doing this over 
a year ago.  Unfortunately I've been unable to track them down...does 
anyone know who they were?







More information about the SeqFan mailing list