Electronic document content extraction and document type determination
Abstract:
A system and method includes receiving content of an electronic document having a document type, the content divided into components each having a unique identifier and selecting an extraction schema based on the document type, the extraction schema having a plurality of data categories. For each of the components, the extraction schema is applied to identify content of the component that corresponds to individual ones of the data categories and saving, with the processor, in an electronic data storage, in a record associated with the component, category metadata indicative of content of the component corresponding to the data categories. In response to obtaining the category metadata for each of the components, applying the extraction schema to the content metadata of each of the components and to the electronic document as a whole to determine document metadata. A user interface displays the document metadata on the user interface.
Information query
Patent Agency Ranking
0/0