Invention Grant
- Patent Title: Corpus quality analysis
-
Application No.: US15692089Application Date: 2017-08-31
-
Publication No.: US10169706B2Publication Date: 2019-01-01
- Inventor: Corville O. Allen , Andrew R. Freed , Richard A. Salmon , Beata J. Strack
- Applicant: International Business Machines Corporation
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agent Stephen R. Tkacs; Stephen J. Walder, Jr.; Diana R. Gerhardt
- Main IPC: G06F17/30
- IPC: G06F17/30 ; G06N5/02 ; G06N99/00 ; G06F17/27

Abstract:
A mechanism is provided in a data processing system for corpus quality analysis. The mechanism applies at least one filter to a candidate corpus to determine a degree to which the candidate corpus supplements existing corpora for performing a natural language processing (NLP) operation. Responsive to a determination to add the candidate corpus to the existing corpora based on a result of applying the at least one filter, the mechanism adds the candidate corpus to the existing corpora to form modified corpora. The mechanism performs the NLP operation using the modified corpora.
Public/Granted literature
- US20180005117A1 Corpus Quality Analysis Public/Granted day:2018-01-04
Information query