Invention Grant
- Patent Title: Hybrid approach for short form detection and expansion to long forms
-
Application No.: US16023697Application Date: 2018-06-29
-
Publication No.: US10282421B2Publication Date: 2019-05-07
- Inventor: Md Faisal M. Chowdhury , Michael R. Glass , Alfio M. Gliozzo
- Applicant: International Business Machines Corporation
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agency: Pepper Hamilton LLP
- Main IPC: G06F17/27
- IPC: G06F17/27

Abstract:
Embodiments provide a system and method for short form and long form detection. Using a language-independent process, the detection system can ingest a corpus of documents, pre-process those documents by tokenizing the documents and performing a part-of-speech analysis, and can filter one or more candidate short forms using one or more filters that select for semantic criteria. Semantic criteria can include the part of speech of a token, whether the token contains more than a pre-determined amount of symbols or digits, whether the token appears too frequently in the corpus of documents, and whether the token has at least one uppercase letter. The detection system can detect short forms independent of case and punctuation, and independent of language-specific metaphone variants.
Public/Granted literature
- US20180307681A1 HYBRID APPROACH FOR SHORT FORM DETECTION AND EXPANSION TO LONG FORMS Public/Granted day:2018-10-25
Information query