• Advertisement

Improving OCR Recommendations Needed

Discuss Best Practices for: Managing Paper files, Document Gathering, Imaging, Coding, OCR, etc.

Moderators: Mark Lieb, Pamela

Improving OCR Recommendations Needed

Postby cannonbrown » Mon Jun 13, 2005 9:11 pm

Our service bureau is currently delivering what we believe is very substandard OCR for use with our litigation databases (i.e., we have sampled scanning the same documents in-house with Adobe and achieved substantially better results).

We don't particularly want to do this processing in-house, and feel the solution might lie in knowing what common service bureau softwares are the best for use with lit databases (i.e., can produce good OCR, create load files, TIFF, endorse, etc.). Then, at least, we can shop for a new provider intelligently, or try the in-house option. Concordance is our database software.

Anyone reasonably happy with their database OCR?

Thanks
User avatar
cannonbrown
Contributor
Contributor
 
Posts: 6
Joined: Mon Mar 07, 2005 12:00 am
Location: Seattle, WA

Advertisement

substandard OCR

Postby Johnrand3 » Tue Jun 14, 2005 1:52 pm

I have heard that I-Archives out of Dallas has come out with OWR (Optical Word Recognition) which is what I hear much much better than OCR. I have also heard that a coding bureau called IQWEST hasd made strides with their OCR.

John Randall
Litigation Support Coordianator
Discovery Document Solutions
202-466-2366
www.discoverydc.com
User avatar
Johnrand3
Contributor
Contributor
 
Posts: 29
Joined: Tue Jun 17, 2003 11:00 pm

Garbage In.....

Postby Larry Lieb » Tue Jun 14, 2005 2:06 pm

It seems to me that people have been getting around poor OCR results pretty well using fuzzy searching techniques.

Would not Optical Word Recognition be just as hampered by illegible/handwritten text as standard OCR?
Larry Lieb

Director of Electronic Discovery

Esquire Litigation Solutions

312-371-7970
User avatar
Larry Lieb
Site Admin
Site Admin
 
Posts: 121
Joined: Tue Apr 15, 2003 11:00 pm
Location: National

iArchives OWR

Postby RowandK » Wed Jun 15, 2005 3:24 pm

The iArchives folks use algorithms that link back to dictionaries and word lists. When their algorithms find something that isn't quite right, they can output ALL the most likely correct values displayed in delimiters. Here's an example:

Microsoft's Share of the Browser Market
One Month Incremental Share of Browser Usage
(According to &*& AdKnowledge Acknowledge &^&, Inc.)



In this case Adknowledge and Acknowledge are the possible choices, with &*& and &^& the delimiters. (Note that they can output the results w/out the choices, and the delimiters are customizable.)

I've worked w them such that ALCoder recognizes OWR for names and dates quite nicely.

Where this is superior to fuzzy searching is that you can build or winnow values as desired and eliminate some of the errors before users get the database.

For more info on OWR contact Jared Dearth at jdearth@iarchives.com . Ask him about their multiple name demo.
"Document coding for pennies, in minutes."
www.alcoder.com, email: rowandk@alcoder.com
User avatar
RowandK
Moderator
Moderator
 
Posts: 28
Joined: Tue May 13, 2003 11:00 pm
Location: DC Metro


Return to Paper Discovery

Who is online

Users browsing this forum: No registered users and 0 guests

  • Advertisement
cron