Canonical training for highly configurable multilingual speech

Invention Grant

US12249336B2 Canonical training for highly configurable multilingual speech 有权

Please log in to see more content

Patent Title: Canonical training for highly configurable multilingual speech
Application No.: US18573846

Application Date: 2021-06-29
Publication No.: US12249336B2

Publication Date: 2025-03-11
Inventor: Jinyu Li , Long Zhou , Xie Sun , Shujie Liu
Applicant: Microsoft Technology Licensing, LLC , Jinyu Li , Long Zhou , Xie Sun , Shujie Liu
Applicant Address: US WA Redmond; US WA Bellevue; CN Beijing; US WA Bellevue; CN Beijing
Assignee: Microsoft Technology Licensing, LLC,Jinyu Li,Long Zhou,Xie Sun,Shujie Liu
Current Assignee: Microsoft Technology Licensing, LLC,Jinyu Li,Long Zhou,Xie Sun,Shujie Liu
Current Assignee Address: US WA Redmond; US WA Bellevue; CN Beijing; US WA Bellevue; CN Beijing
Agency: Workman Nydegger
International Application: PCT/CN2021/102947 WO 20210629
International Announcement: WO2023/272466 WO 20230105
Main IPC: G10L15/32
IPC: G10L15/32 ; G10L15/00 ; G10L15/06 ; G10L15/30

Canonical training for highly configurable multilingual speech

Abstract:

Embodiments are provided for building a configurable multilingual model. A computing system obtains a plurality of language-specific automatic speech recognition modules and a universal automatic speech recognition module trained on a multi-language training dataset comprising training data corresponding to each of the plurality of different languages. The computing system then compiles the universal automatic speech recognition module with the plurality of language-specific automatic speech recognition modules to generate a configurable multilingual model that is configured to selectively and dynamically utilize a sub-set of the plurality of language-specific automatic speech recognition modules with the universal automatic speech recognition module to process audio content in response to user input identifying one or more target languages associated with the audio content.

Public/Granted literature

US20240265924A1 CANONICAL TRAINING FOR HIGHLY CONFIGURABLE MULTILINGUAL SPEECH Public/Granted day:2024-08-08

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/28	.语音识别系统的结构细节
G10L15/32	..以顺序或并行使用的多个识别器；相应的记分组合系统，例如投票系统