Invention Grant
- Patent Title: Automatic delineation and extraction of tabular data in portable document format using graph neural networks
-
Application No.: US17111392Application Date: 2020-12-03
-
Publication No.: US11599711B2Publication Date: 2023-03-07
- Inventor: Peter Zhong , Antonio Jose Jimeno Yepes
- Applicant: International Business Machines Corporation
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agent Stephen R. Yoder
- Main IPC: G06F40/157
- IPC: G06F40/157 ; G06N3/04 ; G06N3/08 ; G06V30/413 ; G06V30/414

Abstract:
Aspects of the present invention disclose a method for automatic delineation and extraction of tabular data in portable document format (PDF). The method includes one or more processors extracting metadata corresponding to tabular data in a text-based portable document format (PDF), wherein the metadata is associated with characters and border lines of the tabular data. The method further includes generating a graph structure corresponding to the tabular data in the text-based PDF based at least in part on the metadata. The method further includes generating a vector representation of the graph structure. The method further includes constructing a tree structure corresponding to the tabular data based at least in part on the vector representation.
Public/Granted literature
Information query