Invention Grant
- Patent Title: Unsupervised corpus expansion using domain-specific terms
-
Application No.: US17177459Application Date: 2021-02-17
-
Publication No.: US11615154B2Publication Date: 2023-03-28
- Inventor: Md Faisal Mahbub Chowdhury , Alfio Massimiliano Gliozzo
- Applicant: International Business Machines Corporation
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agent Stephanie L. Carusillo
- Main IPC: G06F16/00
- IPC: G06F16/00 ; G06F16/951 ; G06F16/9535 ; G06F40/284 ; G06F40/30

Abstract:
In an approach to unsupervised corpus expansion using domain-specific terms, one or more computer processors retrieve one or more domain-specific terms from a corpus of text. One or more computer processors search the World Wide Web for the one or more domain-specific terms to produce a plurality of web pages associated with each of the one or more domain-specific terms. One or more computer processors determine a confidence score for each of the plurality of web pages. One or more computer processors determine the confidence score of at least one of the plurality of web pages exceeds a pre-defined threshold. One or more computer processors add the at least one of the plurality of web pages to the corpus of text.
Public/Granted literature
- US20220261444A1 UNSUPERVISED CORPUS EXPANSION USING DOMAIN-SPECIFIC TERMS Public/Granted day:2022-08-18
Information query