OCR (Optical Character Recognition) software doesn't do well recognizing text inside complex, multi-colored backgrounds. Pretext addresses this issue.
Pretext is designed to do just one thing, and do it well: Find words of text (*) in any image, be it computer-generated, scanned in, or from a photograph, and output a black-and-white image with just those words. Imagine a picture of a road. Pretext finds words in car number plates and signposts, but ignores the road, the sky and the trees.
Existing OCR software combines finding the text with the recognition of text. Pretext separates the tasks, so that by doing the first stage better, the actual conversion of the words is more accurate.
Further, Pretext is designed to find more text than with traditional OCR methods.
(*) Pretext is designed to work with the Latin alphabet ("abc..."). It will have some success with similar alphabets such as Greek and Cyrillic. It is not designed to work with languages such as Farsi, Japanese or Chinese.
Pretext is not an OCR program. Instead it removes the background "noise" to let an OCR program give more accurate results.
Here is an unedited snippet from a printed magazine page. Move the mouse over the image to see what Pretext does.
Here is the result using FreeOCR on the Pretexted image:
¤I\rI8l475 PTA! ANEWSFIELD PUBLICAHON
nu as Juan msn
W. MAGAZINE
AND TWO CASSETTES
E2 99
SINCLAIR SPECTRUM GAMES
Email San Fran Systems at sanfransys@decompiler.org.
San Fran Systems and Pretext logos and text (c) 2006-2010 San Fran Systems