Invention Grant
- Patent Title: Corpus management by automatic categorization into functional domains to support faceted querying
-
Application No.: US15354556Application Date: 2016-11-17
-
Publication No.: US10346442B2Publication Date: 2019-07-09
- Inventor: Charles E. Beller , William G. Dubyak , Palani Sakthi , Kristen M. Summers
- Applicant: International Business Machines Corporation
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agency: Pepper Hamilton LLP
- Main IPC: G06F17/30
- IPC: G06F17/30 ; G06F16/28 ; G06F7/08 ; G06F16/22 ; G06F16/2455 ; G06F16/248

Abstract:
Embodiments can provide a computer implemented method, in a data processing system comprising a processor and a memory comprising instructions which are executed by the processor to cause the processor to implement an enhanced corpus management system, the method comprising: identifying one or more functional domain categories; ingesting one or more incoming documents to form an open-domain corpus; for each functional domain category, identifying one or more representative documents to establish a seed sub-corpus; calculating a degree of fit score between each of the one or more incoming documents and the one or more established functional domain category seed sub-corpora; and assigning one or more of the incoming documents to one or more of the functional domain categories based upon the degree of fit score to create an enhanced corpus.
Public/Granted literature
- US20180137190A1 CORPUS MANAGEMENT BY AUTOMATIC CATEGORIZATION INTO FUNCTIONAL DOMAINS TO SUPPORT FACETED QUERYING Public/Granted day:2018-05-17
Information query