-
公开(公告)号:WO2009075990A1
公开(公告)日:2009-06-18
申请号:PCT/US2008/083249
申请日:2008-11-12
Applicant: MICROSOFT CORPORATION
Inventor: LI, Xiao , GUNAWARDANA, Asela J.R. , ACERO, Alejandro
CPC classification number: G10L13/08 , G10L15/063 , G10L15/187
Abstract: Described is the use of acoustic data to improve grapheme-to-phoneme conversion for speech recognition, such as to more accurately recognize spoken names in a voice-dialing system. A joint model of acoustics and graphonemes (acoustic data, phonemes sequences, grapheme sequences and an alignment between phoneme sequences and grapheme sequences) is described, as is retraining by maximum likelihood training and discriminative training in adapting graphoneme model parameters using acoustic data. Also described is the unsupervised collection of grapheme labels for received acoustic data, thereby automatically obtaining a substantial number of actual samples that may be used in retraining. Speech input that does not meet a confidence threshold may be filtered out so as to not be used by the retrained model.
Abstract translation: 描述了使用声学数据来改进用于语音识别的字形到音素转换,例如更准确地识别语音拨号系统中的语音名称。 描述了声学和图形(声学数据,音素序列,字形序列以及音素序列和图形序列之间的对齐)的联合模型,正如通过使用声学数据适应图形模型参数的最大似然训练和辨别性训练来重新训练。 还描述了用于接收的声学数据的无监督的字母标签集合,从而自动获得可用于再培训的大量实际样本。 不满足置信阈值的语音输入可以被滤除,以便不被再培训的模型使用。