Invention Grant
- Patent Title: Compressing data for natural language processing
- Patent Title (中): 压缩自然语言处理的数据
-
Application No.: US14026240Application Date: 2013-09-13
-
Publication No.: US09146918B2Publication Date: 2015-09-29
- Inventor: Yousuf Mohamed Ashparie , Aaron Keith Baughman
- Applicant: International Business Machines Corporation
- Applicant Address: US NY Armonk
- Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
- Current Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
- Current Assignee Address: US NY Armonk
- Agency: Garg Law Firm, PLLC
- Agent Rakesh Garg; Matthew Chung
- Main IPC: G06F17/30
- IPC: G06F17/30 ; G06F7/00 ; G06F17/28

Abstract:
Data pertaining to a subject matter domain, a set of text strings forming a set of seeds, a description of a linguistic structure present in a language of the domain-related data, and a statistical model applicable to the domain-related data are received. A set of portions of the domain-related data is extracted, a portion in the set of portions forming a nugget. A nugget matches the statistical model according to a criterion, and conforms to the linguistic structure within a threshold degree. The nugget is scored according to a subset of a set of features found in the nuggets. A subset of nuggets is selected. A score of each nugget included in the subset of nuggets exceeds a score threshold. The subset of nuggets is combined to form a pseudo-document. The pseudo-document is submitted to an application for answering a question related to the domain.
Public/Granted literature
- US20150081275A1 COMPRESSING DATA FOR NATURAL LANGUAGE PROCESSING Public/Granted day:2015-03-19
Information query