Invention Grant
- Patent Title: Recombining incorrectly separated tokens in natural language processing
-
Application No.: US14683504Application Date: 2015-04-10
-
Publication No.: US09710450B2Publication Date: 2017-07-18
- Inventor: Barton W. Emanuel , Ahmed M. A. Nassar , Sarbajit K. Rakshit , Craig M. Trim , Albert T. Wong
- Applicant: International Business Machines Corporation
- Applicant Address: US NY Armonk
- Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
- Current Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
- Current Assignee Address: US NY Armonk
- Agency: Garg Law Firm, PLLC
- Agent Rakesh Garg; Christopher K. McLan
- Main IPC: G06F17/20
- IPC: G06F17/20 ; G06F17/28 ; G06F17/27

Abstract:
To recombine incorrectly separated tokens in NLP, a determination is made whether a token from an ordered set of tokens is present in a dictionary related to a corpus from which the ordered set is extracted. When the token is not present in the dictionary, and when a compounding threshold has not been reached, the token is agglutinated with a next adjacent token in the ordered set to form the compound token. The compounding threshold limits a number of tokens that can be agglutinated to form a compound token. A determination is made whether the compound token is present in the dictionary. A weight is assigned to the compound token when the compound token is present in the dictionary and a confidence rating of the compound token is computed as a function of the weight. The compound token and the confidence rating are used in NLP of the corpus.
Public/Granted literature
- US20160299885A1 RECOMBINING INCORRECTLY SEPARATED TOKENS IN NATURAL LANGUAGE PROCESSING Public/Granted day:2016-10-13
Information query