End-to-end speech recognition with policy learning

Invention Grant

US11056099B2 End-to-end speech recognition with policy learning 有权

Please log in to see more content

Patent Title: End-to-end speech recognition with policy learning
Application No.: US16562257

Application Date: 2019-09-05
Publication No.: US11056099B2

Publication Date: 2021-07-06
Inventor: Yingbo Zhou , Caiming Xiong
Applicant: salesforce.com, inc.
Applicant Address: US CA San Francisco
Assignee: salesforce.com, inc.
Current Assignee: salesforce.com, inc.
Current Assignee Address: US CA San Francisco
Agency: Haynes and Boone, LLP
Main IPC: G10L15/06
IPC: G10L15/06 ; G06N3/08 ; G10L15/14 ; G10L15/16 ; G06N3/04 ; G06N7/00 ; G10L25/51

End-to-end speech recognition with policy learning

Abstract:

The disclosed technology teaches a deep end-to-end speech recognition model, including using multi-objective learning criteria to train a deep end-to-end speech recognition model on training data comprising speech samples temporally labeled with ground truth transcriptions. The multi-objective learning criteria updates model parameters of the model over one thousand to millions of backpropagation iterations by combining, at each iteration, a maximum likelihood objective function that modifies the model parameters to maximize a probability of outputting a correct transcription and a policy gradient function that modifies the model parameters to maximize a positive reward defined based on a non-differentiable performance metric which penalizes incorrect transcriptions in accordance with their conformity to corresponding ground truth transcriptions; and upon convergence after a final backpropagation iteration, persisting the modified model parameters learned by using the multi-objective learning criteria with the model to be applied to further end-to-end speech recognition.

Public/Granted literature

US20200005765A1 End-To-End Speech Recognition with Policy Learning Public/Granted day:2020-01-02

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/06	.创建基准模板；训练语音识别系统，例如对说话者声音特征的适应（G10L15/14优先）