Grapheme-to-phoneme conversion using acoustic data

Invention Grant

US07991615B2 Grapheme-to-phoneme conversion using acoustic data 有权

Title translation: 使用声学数据的语音对音素转换

Please log in to see more content

Patent Title: Grapheme-to-phoneme conversion using acoustic data
Patent Title (中): 使用声学数据的语音对音素转换
Application No.: US11952267

Application Date: 2007-12-07
Publication No.: US07991615B2

Publication Date: 2011-08-02
Inventor: Xiao Li , Asela J. R. Gunawardana , Alejandro Acero
Applicant: Xiao Li , Asela J. R. Gunawardana , Alejandro Acero
Applicant Address: US WA Redmond
Assignee: Microsoft Corporation
Current Assignee: Microsoft Corporation
Current Assignee Address: US WA Redmond
Main IPC: G10L15/04
IPC: G10L15/04

Grapheme-to-phoneme conversion using acoustic data

Abstract:

Described is the use of acoustic data to improve grapheme-to-phoneme conversion for speech recognition, such as to more accurately recognize spoken names in a voice-dialing system. A joint model of acoustics and graphonemes (acoustic data, phonemes sequences, grapheme sequences and an alignment between phoneme sequences and grapheme sequences) is described, as is retraining by maximum likelihood training and discriminative training in adapting graphoneme model parameters using acoustic data. Also described is the unsupervised collection of grapheme labels for received acoustic data, thereby automatically obtaining a substantial number of actual samples that may be used in retraining. Speech input that does not meet a confidence threshold may be filtered out so as to not be used by the retrained model.

Abstract(Chinese):

描述了使用声学数据来改进用于语音识别的字形到音素转换，例如更准确地识别语音拨号系统中的语音名称。描述了声学和图形（声学数据，音素序列，字形序列以及音素序列和图形序列之间的对齐）的联合模型，正如通过使用声学数据适应图形模型参数的最大似然训练和鉴别训练来重新训练。还描述了用于接收的声学数据的无监督的字母标签集合，从而自动获得可用于再培训的大量实际样本。不满足置信阈值的语音输入可以被滤除，以便不被再培训的模型使用。

Public/Granted literature

US20090150153A1 GRAPHEME-TO-PHONEME CONVERSION USING ACOUSTIC DATA Public/Granted day:2009-06-11

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/04	.分段；字极限检测