Invention Grant
- Patent Title: Generation of matched corpus for language model training
-
Application No.: US16783402Application Date: 2020-02-06
-
Publication No.: US11276391B2Publication Date: 2022-03-15
- Inventor: Nobuyasu Itoh , Gakuto Kurata , Masayuki Suzuki
- Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
- Applicant Address: US NY Armonk
- Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
- Current Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
- Current Assignee Address: US NY Armonk
- Agency: Tutunjian & Bitetto, P.C.
- Agent Randall Bluestone
- Main IPC: G10L15/183
- IPC: G10L15/183 ; G10L15/06

Abstract:
A computer-implemented method for generating a text is disclosed. The method includes obtaining a first text collection matched with a target domain and a second text collection including a plurality of samples, each of which describes rewriting between a first text and a second text that has a style different from the first text. The method also includes training a text generation model with the first text collection and the second text collection, in which the text generation model has, in a vocabulary, one or more operation tokens indicating rewriting. The method further includes outputting a plurality of texts obtained from the text generation model.
Public/Granted literature
- US20210248996A1 GENERATION OF MATCHED CORPUS FOR LANGUAGE MODEL TRAINING Public/Granted day:2021-08-12
Information query