Invention Grant
US09053350B1 Efficient identification and correction of optical character recognition errors through learning in a multi-engine environment 有权
通过在多引擎环境中学习,有效识别和校正光学字符识别错误

Efficient identification and correction of optical character recognition errors through learning in a multi-engine environment
Abstract:
OCR errors are identified and corrected through learning. An error probability estimator is trained using ground truths to learn error probability estimation. Multiple OCR engines process a text image, and convert it into texts. The error probability estimator compares the outcomes of the multiple OCR engines for mismatches, and determines an error probability for each of the mismatches. If the error probability of a mismatch exceeds an error probability threshold, a suspect is generated and grouped together with similar suspects in a cluster. A question for the cluster is generated and rendered to a human operator for answering. The answer from the human operator is then applied to all suspects in the cluster to correct OCR errors in the resulting text. The answer is also used to further train the error probability estimator.
Information query
Patent Agency Ranking
0/0