San Fran Systems: Pretext Beta 1.0 OUT NOW[2010-09-15]!

Download beta 1.0

Fully Functional
Works under WINE on Linux
Expires October 15th, 2010

Please report bugs for a free license to version 1.0 when it ships

The Problem it Solves

OCR (Optical Character Recognition) software doesn't do well recognizing text inside complex, multi-colored backgrounds. Pretext addresses this issue.

What Pretext does

Pretext is designed to do just one thing, and do it well: Find words of text (*) in any image, be it computer-generated, scanned in, or from a photograph, and output a black-and-white image with just those words. Imagine a picture of a road. Pretext finds words in car number plates and signposts, but ignores the road, the sky and the trees.

Existing OCR software combines finding the text with the recognition of text. Pretext separates the tasks, so that by doing the first stage better, the actual conversion of the words is more accurate.

Further, Pretext is designed to find more text than with traditional OCR methods.

(*) Pretext is designed to work with the Latin alphabet ("abc..."). It will have some success with similar alphabets such as Greek and Cyrillic. It is not designed to work with languages such as Farsi, Japanese or Chinese.

What Pretext doesn't do

Pretext is not an OCR program. Instead it removes the background "noise" to let an OCR program give more accurate results.

What OCR software is and does

OCR software turns pictures of text into actual text that you can copy and paste, or edit in a word processor, email, or put on a webpage.

Documentation

User's Guide

Frequently Asked Questions

Example

Here is an unedited snippet from a printed magazine page. Move the mouse over the image to see what Pretext does.

Here is the result using FreeOCR on the Pretexted image:

дI\rI8l475 PTA! ANEWSFIELD PUBLICAHON
nu as Juan msn
  W. MAGAZINE
AND TWO CASSETTES
E2 99
SINCLAIR SPECTRUM GAMES

Email San Fran Systems at sanfransys@decompiler.org.