- Patent Title: Document information extraction system using sequenced comparators
-
Application No.: US16740754Application Date: 2020-01-13
-
Publication No.: US11657101B2Publication Date: 2023-05-23
- Inventor: Prabhdeep Singh Walia , Vikas Kushwaha
- Applicant: Goldman Sachs & Co. LLC
- Applicant Address: US NY New York
- Assignee: Goldman Sachs & Co. LLC
- Current Assignee: Goldman Sachs & Co. LLC
- Current Assignee Address: US NY New York
- Agency: Fenwick & West LLP
- Main IPC: G06F16/93
- IPC: G06F16/93 ; G06F16/22 ; G06F16/28 ; G06F16/904 ; G06F40/103

Abstract:
A document information extraction system determines a structure of an electronic document based on characteristics of the document's constituent elements. The system segments the document to generate elements with each element having similar characteristics. Elements may be clustered to assist in determining the document structure. The system determines directional relationships between elements (e.g., above, below, etc.). The system then employs a master comparator to determine familial relationships between adjacent elements. The master comparator includes a set of unit comparators and each unit comparator compares a specific characteristic between two elements. The master comparator sequentially applies the unit comparators to determine the familial relationship based on the comparisons. The system outputs a document hierarchy tree reflecting the determined familial relationships. The hierarchy tree represents the structure of the document.
Public/Granted literature
- US20210216595A1 DOCUMENT INFORMATION EXTRACTION SYSTEM USING SEQUENCED COMPARATORS Public/Granted day:2021-07-15
Information query