Invention Grant
- Patent Title: Use of small unit language model for training large unit language models
-
Application No.: US15909206Application Date: 2018-03-01
-
Publication No.: US10832657B2Publication Date: 2020-11-10
- Inventor: Masayuki Suzuki , Nobuyasu Itoh , Gakuto Kurata
- Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agency: Tutunjian & Bitetto, P.C.
- Agent Randall Bluestone
- Main IPC: G10L15/06
- IPC: G10L15/06 ; B41B27/08 ; G10L15/18 ; G10L15/183 ; G10L15/00 ; G06F40/284 ; G06F40/47 ; G06F40/49 ; G06N3/04

Abstract:
A computer-implemented method, computer program product, and apparatus are provided. The method includes generating a plurality of sequences of small unit tokens from a first language model that is trained with a small unit corpus including the small unit tokens, the small unit corpus having been derived by tokenization with a small unit. The method further includes tokenizing the plurality of sequences of small unit tokens by a large unit that is larger than the small unit, to create a derived large unit corpus including derived large unit tokens.
Public/Granted literature
- US20190272318A1 USE OF SMALL UNIT LANGUAGE MODEL FOR TRAINING LARGE UNIT LANGUAGE MODELS Public/Granted day:2019-09-05
Information query