Automatic selection of templates for extraction of data from electronic documents
Abstract:
A computer-implemented method for automatic template selection for extracting data from an input electronic document is provided. The method includes receiving a first set of candidate templates and an input electronic document. For each candidate template, a template similarity ratio value is calculated that represents a similarity of the candidate template to the input electronic document. The first set of candidate templates are ranked according to the template similarity ratios and then matched to the input electronic document resulting in generating a normalized similarity score for each particular candidate from among the candidate templates. Differences in normalized similarity scores of successive pairs of the candidate templates is determined and a breaking point is established. A second set of candidate templates is formed by selecting candidate templates that are ranked above the breaking point. Data from the input electronic document is extracted using the second set of candidate templates.
Information query
Patent Agency Ranking
0/0