Automatic delineation and extraction of tabular data in portable document format using graph neural networks
Abstract:
Aspects of the present invention disclose a method for automatic delineation and extraction of tabular data in portable document format (PDF). The method includes one or more processors extracting metadata corresponding to tabular data in a text-based portable document format (PDF), wherein the metadata is associated with characters and border lines of the tabular data. The method further includes generating a graph structure corresponding to the tabular data in the text-based PDF based at least in part on the metadata. The method further includes generating a vector representation of the graph structure. The method further includes constructing a tree structure corresponding to the tabular data based at least in part on the vector representation.
Information query
Patent Agency Ranking
0/0