Invention Grant
- Patent Title: Hybrid approach for short form detection and expansion to long forms
-
Application No.: US15195442Application Date: 2016-06-28
-
Publication No.: US10083170B2Publication Date: 2018-09-25
- Inventor: Md Faisal M. Chowdhury , Michael R. Glass , Alfio M. Gliozzo
- Applicant: International Business Machines Corporation
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agency: Pepper Hamilton LLP
- Main IPC: G06F17/27
- IPC: G06F17/27 ; G06F17/22

Abstract:
Embodiments provide a system and method for short form and long form detection. Given candidate short forms, the system can generate one or more n-gram combinations, resulting in one or more candidate short form and n-gram combination pairs. For each candidate short form and n-gram combination pair, the system can calculate an approximate string matching distance, calculate a best possible alignment score, calculate a confidence score, calculate a topic similarity score, and calculate a semantic similarity score. The system can determine the validity, through a meta learner, of the one or more valid candidate short form and n-gram combination pairs based upon each short form and n-gram combination pair's confidence score, topic similarity score, and semantic similarity score, and store the valid short form and n-gram combination pairs in a repository. The system has no language specific constraints and can extract short form and long form pairs from documents written in various languages. The system is also not limited to whether the language of the given corpus is case sensitive or not.
Public/Granted literature
- US20170371857A1 HYBRID APPROACH FOR SHORT FORM DETECTION AND EXPANSION TO LONG FORMS Public/Granted day:2017-12-28
Information query