Identification and extraction of acronym/definition pairs in documents
Abstract:
A method and apparatus that can extract domain-specific acronyms and their definitions from large documents is disclosed. Strings of characters indicative of candidate acronyms within a portion of a document may be identified and extracted. Definitions for each selected string of characters may be extracted from text within the document proximal to that string of characters. Candidate acronym/definition pairs may be created for each selected string of characters based on the string of characters and their definitions. A classification system may be iteratively applied to the candidate acronym/definition pairs to create or update an acronym/definition pair dictionary for the document.
Information query
Patent Agency Ranking
0/0