Invention Grant
- Patent Title: System and method for building diverse language models
-
Application No.: US15670246Application Date: 2017-08-07
-
Publication No.: US11328121B2Publication Date: 2022-05-10
- Inventor: Luciano De Andrade Barbosa , Srinivas Bangalore
- Applicant: Nuance Communications, Inc.
- Applicant Address: US MA Burlington
- Assignee: Nuance Communications, Inc.
- Current Assignee: Nuance Communications, Inc.
- Current Assignee Address: US MA Burlington
- Main IPC: G06F17/28
- IPC: G06F17/28 ; G06F40/216 ; G06F40/40 ; G06F40/10 ; G06F40/205 ; G06F40/242 ; G06F40/279 ; G10L15/06

Abstract:
Disclosed herein are systems, methods, and non-transitory computer-readable storage media for collecting web data in order to create diverse language models. A system configured to practice the method first crawls, such as via a crawler operating on a computing device, a set of documents in a network of interconnected devices according to a visitation policy, wherein the visitation policy is configured to focus on novelty regions for a current language model built from previous crawling cycles by crawling documents whose vocabulary considered likely to fill gaps in the current language model. A language model from a previous cycle can be used to guide the creation of a language model in the following cycle. The novelty regions can include documents with high perplexity values over the current language model.
Public/Granted literature
- US20170337185A1 SYSTEM AND METHOD FOR BUILDING DIVERSE LANGUAGE MODELS Public/Granted day:2017-11-23
Information query