Invention Grant
- Patent Title: System and method to extract information from unstructured image documents
-
Application No.: US17405964Application Date: 2021-08-18
-
Publication No.: US11769341B2Publication Date: 2023-09-26
- Inventor: Yashu Seth , Ravil Kashyap , Shaik Kamran Moinuddin , Vijayendra Mysore Shamanna , Henry Thomas Peter , Simha Sadasiva
- Applicant: Ushur, Inc.
- Applicant Address: US CA Santa Clara
- Assignee: Ushur, Inc.
- Current Assignee: Ushur, Inc.
- Current Assignee Address: US CA Santa Clara
- Agency: Lowenstein Sandler LLP
- Agent Madhumita Datta
- Main IPC: G06V30/40
- IPC: G06V30/40 ; G06T7/10 ; G06V10/94 ; G06F18/24 ; G06V30/19 ; G06V30/148

Abstract:
The present disclosure relates to a system and method to extract information from unstructured image documents. The extraction technique is content-driven and not dependent on the layout of a particular image document type. The disclosed method breaks down an image document into smaller images using the text cluster detection algorithm. The smaller images are converted into text samples using optical character recognition (OCR). Each of the text samples is fed to a trained machine learning model. The model classifies each text sample into one of a plurality of pre-determined field types. The desired value extraction problem may be converted into a question-answering problem using a pre-trained model. A fixed question is formed on the basis of the classified field type. The output of the question-answering model may be passed through a rule-based post-processing step to obtain the final answer.
Public/Granted literature
- US20220058383A1 SYSTEM AND METHOD TO EXTRACT INFORMATION FROM UNSTRUCTURED IMAGE DOCUMENTS Public/Granted day:2022-02-24
Information query