Discussion:
editable text file problem
(too old to reply)
RB
2006-03-12 16:09:41 UTC
Permalink
Why do scanners/OCR programs not handle tables very well when saving to
editable text? They all seem to be afflicted with this malady.

Trying to get an editable table of colums of data from a scanned page has
turned out to be mission impossible.

Anyone know a simple way around this?
Danny
2006-03-13 06:43:15 UTC
Permalink
Post by RB
Why do scanners/OCR programs not handle tables very well when saving to
editable text? They all seem to be afflicted with this malady.
Trying to get an editable table of colums of data from a scanned page has
turned out to be mission impossible.
Anyone know a simple way around this?
Hi RB,

I'm not an expert on recognition technology, but the primary challenge
has to do with the the OCR engine in general has difficulty handling
matricies. The vertical and horizontal lines combined with sometimes
small amounts of whitespace between words can easily throw off
recognition engines.

Just like everything else, not all OCR engines are created equal. Some
will definitely perform better than others (general rule of thumb: "you
get what you pay for..." usually applies)

If you have a heavy volume of these types of documents/ forms, and they
can be categorised into standard forms, then OMR (optical mark
recognition) software could be a good solution. OMR software would use
anchor points (black boxes on the corners of the document ) to perform
zonal OCR using a template that you preset specific to that
matrix/table.

hope it helps~

Danny
RB
2006-03-13 15:07:02 UTC
Permalink
Thanks for the explanation. I only have occasional, personal document need
for such. On the few occasions I've tried it, I've been stymied. I tried
cutting and pasting, etc, but never got around the problem.

What I was doing was trying to put word processed (I didn't have the word
processor files; only the hard copy tables) tables and charts into a
database. If I could have converted the hard copy docs to editable text, I
could have then imported the files into database, and avoid having to
manually enter all that data.

No big deal. I was just hoping there was some little trick I was missing.
Seems not.

Loading...