
Xerox has been known forever as The Document Company and though challenged in the OCR market these days, they have pulled out a winner with this offering.
I do a lot of scanning and OCR work for another group, DharmaNet. They are compiling electronic editions of books out of print whose publishers have granted them permission to create such editions. Many of them are in another language and must be transliterated as well as transcribed. My scanner is an Envisions 8100 and I used an OEM copy of TextBridge Version 1.01 included with the scanner. My system used was a 486-66 with 16 Megs RAM and 50 Megs Virtual Memory under Windows 3.1. Xerox recommends 8 megs RAM and 8 Megs Virtual Memory. The new version is Windows 95 compatable.
Installation was flawless using the 6 enclosed disks, Five for the program and one for sample tiff documents for the tutorial. It took about 30 minutes to run the the install routine. The documentation and tutorial (print copy with examples) took me about another 30 minutes to run through. The features are what makes this program shine.
Most OCR’s including the old Textbridge will scan a page into a tiff file and then convert it to a an ascii text document. From there you had to go in and redo the entire layout unless all you wanted was just the written information.
Version 3.02 fixes all that. It will add a macro into most any Windows text processor to run Textbridge from that application. among those supported are Word 6.0+, Wordperfect 6.0+, Ami Pro and WordPro96, Wordstar, Write and Notepad.
WYSIWIG output is the main wonder. It will read and convert your page and it’s graphics and it’s layout directly into your application and all portions are fully editable. No cut and paste on the user end is required. It’s all automatic. The hours this saves is incredible. The true document recomposition feature is supported in Word and Wordperfect.
One superior advantage of this feature. When you read in a multi-column layout, it is recognized, displayed and fully editable as a multi-column document. This has been a headache for ages for those of us who scan articles for redisplay.
Textbridge will proof your doc within your host Word or Wordperfect application, or within Textbridge itself. The Textbridge engine is still trainable. Specifically, as you proof, you can accept or reject or change corrections and the next time it “sees” that text, it knows the proper correction, thus the word recognition improves with use within a specific document. Using this feature on a one page scan is not so important but when you have many pages in the same typeface and resolution or (as in my case) entire books, this feature is indispensible.
Some other neat Textbridge features are worth mentioning. It provides broad scaner support on ISIS, HP Accupage 2.0, and the TWAIN standard. Image processing is supported in TIFF, PCX, DCX, and Windows BMP. It will output text to all formats mentioned in the article as well as Excel and Postscript and the Windows Clipboard in both RTF and ASCII formats.
One thing essentially unchanged as noted by processing the sample
documents using the old 1.01 and new version 3.02 engine is the quality
of character recognition itself. In my sample doc, both recognized about
99% of the words and characters and both specifically misread the word
“offer” as “otter”. Both caught a poor copy of “four-day”
as suspect when it “saw” “tour-day”. Overall, I’m very impressed
with the improvements which will save me a good 500 hours of formatting
this year alone. It’s available at Egghead for $314.98 or as an upgrade
for $156.98. Discount 10% with your cue card and PC Alamode coupon.