Recombining incorrectly separated tokens in natural language processing

Invention Grant

US09710450B2 Recombining incorrectly separated tokens in natural language processing 有权

Please log in to see more content

Patent Title: Recombining incorrectly separated tokens in natural language processing
Application No.: US14683504

Application Date: 2015-04-10
Publication No.: US09710450B2

Publication Date: 2017-07-18
Inventor: Barton W. Emanuel , Ahmed M. A. Nassar , Sarbajit K. Rakshit , Craig M. Trim , Albert T. Wong
Applicant: International Business Machines Corporation
Applicant Address: US NY Armonk
Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
Current Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
Current Assignee Address: US NY Armonk
Agency: Garg Law Firm, PLLC
Agent Rakesh Garg; Christopher K. McLan
Main IPC: G06F17/20
IPC: G06F17/20 ; G06F17/28 ; G06F17/27

Recombining incorrectly separated tokens in natural language processing

Abstract:

To recombine incorrectly separated tokens in NLP, a determination is made whether a token from an ordered set of tokens is present in a dictionary related to a corpus from which the ordered set is extracted. When the token is not present in the dictionary, and when a compounding threshold has not been reached, the token is agglutinated with a next adjacent token in the ordered set to form the compound token. The compounding threshold limits a number of tokens that can be agglutinated to form a compound token. A determination is made whether the compound token is present in the dictionary. A weight is assigned to the compound token when the compound token is present in the dictionary and a confidence rating of the compound token is computed as a function of the weight. The compound token and the confidence rating are used in NLP of the corpus.

Public/Granted literature

US20160299885A1 RECOMBINING INCORRECTLY SEPARATED TOKENS IN NATURAL LANGUAGE PROCESSING Public/Granted day:2016-10-13

Information query

Espacenet