Document structure identification using post-processing error correction
Abstract:
Techniques are disclosed for identifying document structural elements and correcting errors in the classification and/or location of the identified structural elements. An example method includes determining location and classification for a structural element on a page of the document using a machine learning (ML) model; determining one or more errors in the location and/or classification for the structural element; and correcting each instance of the one or more errors using other content in the document (e.g., content spatially adjacent to the corresponding structural element on the page of the document). The method may further include storing the document and the location and classification (as corrected), and/or generating a structural map of the page of the document based on the location and classification (as corrected). The use of the document content to correct errors greatly enhances the agreement between the identified structural elements and the original document.
Information query
Patent Agency Ranking
0/0