Method for re-aligning corpus and improving the consistency

Invention Grant

US11276394B2 Method for re-aligning corpus and improving the consistency 有权

Please log in to see more content

Patent Title: Method for re-aligning corpus and improving the consistency
Application No.: US16777804

Application Date: 2020-01-30
Publication No.: US11276394B2

Publication Date: 2022-03-15
Inventor: Nobuyasu Itoh , Gakuto Kurata
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Applicant Address: US NY Armonk
Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
Current Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
Current Assignee Address: US NY Armonk
Agency: Tutunjian & Bitetto, P.C.
Agent Randall Bluestone
Main IPC: G10L15/183
IPC: G10L15/183 ; G10L15/197 ; G10L15/02 ; G06F40/49 ; G06F40/279

Method for re-aligning corpus and improving the consistency

Abstract:

Vocabulary consistency for a language model may be improved by splitting a target token in an initial vocabulary into a plurality of split tokens, calculating an entropy of the target token and an entropy of the plurality of split tokens in a bootstrap language model, and determining whether to delete the target token from the initial vocabulary based on at least the entropy of the target token and the entropy of the plurality of split tokens.

Public/Granted literature

US20200168213A1 METHOD FOR RE-ALIGNING CORPUS AND IMPROVING THE CONSISTENCY Public/Granted day:2020-05-28

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/08	.语音分类或检索
G10L15/18	..利用自然语言模型
G10L15/183	...用上下文相关性，例如：语言模型