Invention Grant
- Patent Title: Automatic transformation of complex tables in documents into computer understandable structured format and providing schema-less query support data extraction
-
Application No.: US16389073Application Date: 2019-04-19
-
Publication No.: US11194797B2Publication Date: 2021-12-07
- Inventor: Mustafa Canim , Cristina Cornelio , Arun Iyengar , Ryan A. Musa , Mariano Rodriguez Muro
- Applicant: International Business Machines Corporation
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agency: Fleit Intellectual Property Law
- Agent Jose Gutman
- Main IPC: G06F16/00
- IPC: G06F16/00 ; G06F16/2452 ; G06F16/2458 ; G06F16/28 ; G06F16/25 ; G06F16/22 ; G06F16/2455 ; G06F40/157

Abstract:
An information processing system, a computer readable storage medium, and a computer-implemented method, collect tables from a corpus of documents, convert the collected tables to flattened table format and organized to be searchable by schema-less queries. A method collects tables, extracts feature values from collected table data and collected table meta-data for each collected table. A table classifier classifies each collected table as being a type of table. Based on the classifying, the collected table is converted to a flattened table including table values that are the table data and the table meta-data of the collected table. Dependencies of the data values are mapped. The flattened table and mapped dependencies are stored in a triple store searchable by schema-less queries. The table classifier learns and improves its accuracy and reliability. Dependency information is maintained among a plurality of database tables. The dependency information can be updated at variable update frequency.
Public/Granted literature
Information query