Invention Grant
US08843815B2 System and method for automatically extracting metadata from unstructured electronic documents 有权
从非结构化电子文档自动提取元数据的系统和方法

System and method for automatically extracting metadata from unstructured electronic documents
Abstract:
A system and method for automatically extracting meta data from unstructured electronic documents is disclosed. In one embodiment, the unstructured electronic document is converted into a plain text document. Further, a document header of the unstructured electronic document is extracted from the plain text document using a rule-based document header extractor, where the rule-based document header extractor may be based on a rule that includes determining a ratio of a number of words with their initial letters capitalized in a text line over a total number of words in the text line in the plain text document. Moreover, meta data is extracted from the extracted document header using a heuristic approach.
Information query
Patent Agency Ranking
0/0