Invention Grant
- Patent Title: Extracting structured information from a document containing filled form images
-
Application No.: US16192028Application Date: 2018-11-15
-
Publication No.: US10755039B2Publication Date: 2020-08-25
- Inventor: Antonio Foncubierta Rodriguez , Guillaume Jaume , Maria Gabrani
- Applicant: International Business Machines Corporation
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agency: Fleit Intellectual Property Law
- Agent Jon Gibbons
- Main IPC: G06F17/00
- IPC: G06F17/00 ; G06F40/174 ; G06K9/00

Abstract:
A system and process for extracting information from filled form images is described. In one example the claimed invention first extracts textual information and the hierarchy in a blank form. This information is then used to extract and understand the content of filled forms. In this way, the system does not have to analyze from the beginning each filled form. The system is designed so that it remains as generic as possible. The number of hard coded rules in the whole pipeline was minimized to offer an adaptive solution able to address the largest number of forms, with various structures and typography. The system is also created to be integrated as a built-in function in a larger pipeline. The form understanding pipeline could be the starting point of any advanced Natural Language Processing application.
Public/Granted literature
- US20200159820A1 EXTRACTING STRUCTURED INFORMATION FROM A DOCUMENT CONTAINING FILLED FORM IMAGES Public/Granted day:2020-05-21
Information query