- Patent Title: Method of selecting training text for language model, and method of training language model using the training text, and computer and computer program for executing the methods
-
Application No.: US14803324Application Date: 2015-07-20
-
Publication No.: US09934776B2Publication Date: 2018-04-03
- Inventor: Nobuyasu Itoh , Gakuto Kurata , Masafumi Nishimura
- Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
- Applicant Address: US NY Armonk
- Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
- Current Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
- Current Assignee Address: US NY Armonk
- Agency: Cantor Colburn LLP
- Agent Rabin Bhattacharya
- Priority: JP2014-150554 20140724
- Main IPC: G10L15/06
- IPC: G10L15/06 ; G10L15/26 ; G10L15/197 ; G10L15/065

Abstract:
Method of selecting training text for language model, and method of training language model using the training text, and computer and computer program for executing the methods. The present invention provides for selecting training text for a language model that includes: generating a template for selecting training text from a corpus in a first domain according to generation techniques of: (i) replacing one or more words in a word string selected from the corpus in the first domain with a special symbol representing any word or word string, and adopting the word string after replacement as a template for selecting the training text; and/or (ii) adopting the word string selected from the corpus in the first domain as the template for selecting the training text; and selecting text covered by the template as the training text from a corpus in a second domain different from the first domain.
Public/Granted literature
Information query