Sunday, April 27, 2008


The other day a friend told me about reCAPTCHA, a CAPTCHA-based program (attempting to block spam bots) with an interesting twist. The developers (CMU CS students) are combining a web-scale CAPTCHA solution with an optical character recognition (OCR) system. Since OCR systems are not perfect, they have trouble recognizing some scanned words that humans can typically recognize. By combining known and unknown words in their reCAPTCHA tests, they can simultaneously sort humans from bots and enable an Internet-scale crowdsourced OCR solution. They're using the human-provided text answers to help the Internet Archive project with their digitization efforts.

Recaptcha and is available for use on your own site, and they have plugins to make it even easier to add to Wordpress or Mediawiki based sites.

No comments: