Method and system for data extraction from images of semi-structured documents

Invention Grant

US09754176B2 Method and system for data extraction from images of semi-structured documents 有权

Please log in to see more content

Patent Title: Method and system for data extraction from images of semi-structured documents
Application No.: US14868683

Application Date: 2015-09-29
Publication No.: US09754176B2

Publication Date: 2017-09-05
Inventor: Mikhail Kostyukov
Applicant: ABBYY Development LLC
Applicant Address: RU Moscow
Assignee: ABBYY PRODUCTION LLC
Current Assignee: ABBYY PRODUCTION LLC
Current Assignee Address: RU Moscow
Agency: Lowenstein Sandler LLP
Priority: RU2015137956 20150907
Main IPC: G06K9/18
IPC: G06K9/18 ; G06K9/46

Method and system for data extraction from images of semi-structured documents

Abstract:

The present invention is directed to a method of extracting data from fields in an image of a document. In one implementation, a text representation of the image of the document is obtained. A graph for storing features of the text fragments in the text representation of the image of the document and their links is constructed. A cascade classification for computing the features of the text fragments in the text representation of the image of the document and their link is run. Hypotheses about the belonging of text fragments to the fields in the image of the document are generated. Combinations of the hypotheses are generated. A combination of the hypotheses is selected. And data from the fields in the image of the document is extracted based on the selected combination of the hypotheses.

Public/Granted literature

US20170068866A1 METHOD AND SYSTEM FOR DATA EXTRACTION FROM IMAGES OF SEMI-STRUCTURED DOCUMENTS Public/Granted day:2017-03-09

Information query

Espacenet