Multi-modal learning based intelligent enhancement of post optical character recognition error correction

Invention Grant

US11842524B2 Multi-modal learning based intelligent enhancement of post optical character recognition error correction 有权

Please log in to see more content

Patent Title: Multi-modal learning based intelligent enhancement of post optical character recognition error correction
Application No.: US17245349

Application Date: 2021-04-30
Publication No.: US11842524B2

Publication Date: 2023-12-12
Inventor: Rajesh M. Desai , Ayush Utkarsh , Nazrul Islam , Praveen Vyas
Applicant: International Business Machines Corporation
Applicant Address: US NY Armonk
Assignee: International Business Machines Corporation
Current Assignee: International Business Machines Corporation
Current Assignee Address: US NY Armonk
Agent Stephen J. Walder, Jr.; Matt Zebrec
Main IPC: G06K9/03
IPC: G06K9/03 ; G06V10/40 ; G06F40/126 ; G06F40/109 ; G06N20/00 ; G06F40/232 ; G06V30/10 ; G06F18/214 ; G06V30/19 ; G06V30/12 ; G06V10/82 ; G06V30/26 ; G06F18/213 ; G06N3/0464 ; G06N3/0442 ; G06N3/0455 ; G06V10/98 ; G06F40/279 ; G06V30/242

Multi-modal learning based intelligent enhancement of post optical character recognition error correction

Abstract:

A mechanism is provided for implementing an optical character recognition (OCR) error correction mechanism for correcting OCR errors. Responsive to receiving a document in which OCR has been performed, the mechanism assesses the document to identify a set of OCR errors generated by an OCR engine that performed the OCR using a set of visual embeddings. Responsive to identifying the set of OCR errors, the mechanism analyzes each character of a plurality of sentences within the document to generate a high-dimensional embedding for the characters of the plurality of sentences within the document. The mechanism then linguistically corrects each OCR error in the set of OCR error. The mechanism utilizes ground truth information and the set of visual embeddings to verify that character stream is linguistically correct. Responsive to verifying that the character stream is linguistically correct, the mechanism outputs an OCR error corrected document to a user.

Public/Granted literature

US20220350998A1 Multi-Modal Learning Based Intelligent Enhancement of Post Optical Character Recognition Error Correction Public/Granted day:2022-11-03

Information query

Espacenet