Invention Grant
- Patent Title: Method for re-aligning corpus and improving the consistency
-
Application No.: US16777804Application Date: 2020-01-30
-
Publication No.: US11276394B2Publication Date: 2022-03-15
- Inventor: Nobuyasu Itoh , Gakuto Kurata
- Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
- Applicant Address: US NY Armonk
- Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
- Current Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
- Current Assignee Address: US NY Armonk
- Agency: Tutunjian & Bitetto, P.C.
- Agent Randall Bluestone
- Main IPC: G10L15/183
- IPC: G10L15/183 ; G10L15/197 ; G10L15/02 ; G06F40/49 ; G06F40/279

Abstract:
Vocabulary consistency for a language model may be improved by splitting a target token in an initial vocabulary into a plurality of split tokens, calculating an entropy of the target token and an entropy of the plurality of split tokens in a bootstrap language model, and determining whether to delete the target token from the initial vocabulary based on at least the entropy of the target token and the entropy of the plurality of split tokens.
Public/Granted literature
- US20200168213A1 METHOD FOR RE-ALIGNING CORPUS AND IMPROVING THE CONSISTENCY Public/Granted day:2020-05-28
Information query