Invention Grant
- Patent Title: Efficient identification and correction of optical character recognition errors through learning in a multi-engine environment
- Patent Title (中): 通过在多引擎环境中学习,有效识别和校正光学字符识别错误
-
Application No.: US13619853Application Date: 2012-09-14
-
Publication No.: US09053350B1Publication Date: 2015-06-09
- Inventor: Ahmad E. Abdulkader , Matthew R. Casey
- Applicant: Ahmad E. Abdulkader , Matthew R. Casey
- Applicant Address: US CA Mountain View
- Assignee: Google Inc.
- Current Assignee: Google Inc.
- Current Assignee Address: US CA Mountain View
- Agency: Fenwick & West LLP
- Main IPC: G06K9/00
- IPC: G06K9/00 ; G06K9/62 ; G06K9/03

Abstract:
OCR errors are identified and corrected through learning. An error probability estimator is trained using ground truths to learn error probability estimation. Multiple OCR engines process a text image, and convert it into texts. The error probability estimator compares the outcomes of the multiple OCR engines for mismatches, and determines an error probability for each of the mismatches. If the error probability of a mismatch exceeds an error probability threshold, a suspect is generated and grouped together with similar suspects in a cluster. A question for the cluster is generated and rendered to a human operator for answering. The answer from the human operator is then applied to all suspects in the cluster to correct OCR errors in the resulting text. The answer is also used to further train the error probability estimator.
Information query