Invention Grant
- Patent Title: Automatic corpus selection and halting condition detection for semantic asset expansion
-
Application No.: US15835919Application Date: 2017-12-08
-
Publication No.: US10740379B2Publication Date: 2020-08-11
- Inventor: Alfredo Alba , Clemens Drews , Daniel F. Gruhl , Linda H. Kato , Neal R. Lewis , Pablo N. Mendes , Meenakshi Nagarajan
- Applicant: International Business Machines Corporation
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agent Stephen J. Walder, Jr.; Stephen R. Tkacs; William J. Stock
- Main IPC: G06F16/36
- IPC: G06F16/36 ; G06F7/08 ; G06F16/35

Abstract:
A mechanism is provided in a data processing system comprising at least one processor and at least one memory, the at least one memory comprising instructions executed by the at least one processor to cause the at least one processor to implement an automated lexicon expansion for an identified corpus. For a selected corpus in a set of corpora, the mechanism determines an estimated number of new terms in the selected corpus that are not in the lexicon based on a frequency count known terms in the selected corpus. Responsive to the estimated number of new terms in the selected corpus being greater than a threshold, the mechanism performs lexicon expansion using the selected corpus to form an expanded lexicon. Responsive to the estimated number of new terms in the selected corpus not being greater than the threshold, the mechanism halts lexicon expansion.
Public/Granted literature
- US20180225374A1 Automatic Corpus Selection and Halting Condition Detection for Semantic Asset Expansion Public/Granted day:2018-08-09
Information query