Invention Grant
- Patent Title: Information extraction from open-ended schema-less tables
-
Application No.: US16145676Application Date: 2018-09-28
-
Publication No.: US10740545B2Publication Date: 2020-08-11
- Inventor: Joshua Allen , Andrew R. Freed , Thai T. La
- Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
- Applicant Address: US NY Armonk
- Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
- Current Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
- Current Assignee Address: US NY Armonk
- Agency: Tutunjian & Bitetto, P.C.
- Agent Nicole Spence
- Main IPC: G06F40/169
- IPC: G06F40/169 ; G06F40/216 ; G06F40/18 ; G06F16/31 ; G06F16/33 ; G06F16/35 ; G06K9/00

Abstract:
Systems and methods for generating and annotating cell documents include extracting tables from a document using a table extraction engine. Headers are extracted for each of the tables using a header detection engine. Cells are extracted from each of the tables using a cell extraction engine. A cell document is generated for each of the cells which are each correlated to corresponding portions of the headers, each cell document recording the correlation between the cells and the the headers. Each cell document is annotated to generate annotated cell documents with a cell recognition model trained to perform natural language processing on the cell documents by classifying each term in each of the cell documents and extracting relationships between the terms of each of the cell documents.
Public/Granted literature
- US20200104350A1 INFORMATION EXTRACTION FROM OPEN-ENDED SCHEMA-LESS TABLES Public/Granted day:2020-04-02
Information query