Invention Grant
- Patent Title: OCR of books by word recognition
- Patent Title (中): OCR的书籍通过单词识别
-
Application No.: US12103717Application Date: 2008-04-16
-
Publication No.: US08014604B2Publication Date: 2011-09-06
- Inventor: Asaf Tzadok , Eugeniusz Walach
- Applicant: Asaf Tzadok , Eugeniusz Walach
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Main IPC: G06K9/34
- IPC: G06K9/34 ; G06K9/68 ; G06K7/10

Abstract:
Disclosed embodiments of the invention provide automated global optimization methods and systems of OCR, tailored to each document being digitized. A document-specific database is created from an OCR scan of a document of interest, which contains an exhaustive listing of words in the document. Images of each word, taken from all the fonts encountered, are entered into the database and mapped to a corresponding textual representation. After entry of a first instance of an image of a word written in a particular font, each new occurrence of the word in that font can be quickly recognized by image processing techniques. The disclosed methods and systems may be used in conjunction with adaptive character recognition training and word recognition training of the OCR engines.
Public/Granted literature
- US20090263019A1 OCR of books by word recognition Public/Granted day:2009-10-22
Information query