Invention Grant
- Patent Title: Document information extraction for computer manipulation
-
Application No.: US16941349Application Date: 2020-07-28
-
Publication No.: US11436852B2Publication Date: 2022-09-06
- Inventor: Ranadeep Bhuyan , Shubhajit Saha , Sudipto Ghosh
- Applicant: Intuit Inc.
- Applicant Address: US CA Mountain View
- Assignee: Intuit Inc.
- Current Assignee: Intuit Inc.
- Current Assignee Address: US CA Mountain View
- Agency: Paradice and Li LLP
- Main IPC: G06K9/00
- IPC: G06K9/00 ; G06V30/416 ; G06K9/62 ; G06T11/60 ; G06T7/11 ; G06V10/75 ; G06V30/10

Abstract:
Systems and apparatuses are disclosed for extracting information from document images. An example method includes segmenting a document image into multiple segments and determining formatting information for each segment. Determining formatting information for a segment includes determining one or more features of the segment and comparing the one or more features of the segment to one or more clusters of features associated with different document types. The formatting information for the segment is based on the comparison. The method also includes, for each segment, storing the formatting information in a data structure associated with the segment. The method further includes, for each segment including text to be identified during information extraction, applying OCR to the segment to generate machine-encoded text and storing the machine-encoded text in the associated data structure.
Public/Granted literature
- US20220036063A1 DOCUMENT INFORMATION EXTRACTION FOR COMPUTER MANIPULATION Public/Granted day:2022-02-03
Information query