Estimating speaker-specific affine transforms for neural network based speech recognition systems

Invention Grant

US09378735B1 Estimating speaker-specific affine transforms for neural network based speech recognition systems 有权

Title translation: 基于神经网络的语音识别系统估计说话人特定的仿射变换

Please log in to see more content

Patent Title: Estimating speaker-specific affine transforms for neural network based speech recognition systems
Patent Title (中): 基于神经网络的语音识别系统估计说话人特定的仿射变换
Application No.: US14135474

Application Date: 2013-12-19
Publication No.: US09378735B1

Publication Date: 2016-06-28
Inventor: Sri Venkata Surya Siva Rama Krishna Garimella , Bjorn Hoffmeister , Nikko Strom
Applicant: Amazon Technologies, Inc.
Applicant Address: US WA Seattle
Assignee: Amazon Technologies, Inc.
Current Assignee: Amazon Technologies, Inc.
Current Assignee Address: US WA Seattle
Agency: Knobbe, Martens, Olson & Bear, LLP
Main IPC: G10L15/16
IPC: G10L15/16 ; G10L15/06 ; G10L15/20 ; G10L13/08

Estimating speaker-specific affine transforms for neural network based speech recognition systems

Abstract:

Features are disclosed for estimating affine transforms in Log Filter-Bank Energy Space (“LFBE” space) in order to adapt artificial neural network-based acoustic models to a new speaker or environment. Neural network-based acoustic models may be trained using concatenated LFBEs as input features. The affine transform may be estimated by minimizing the least squares error between corresponding linear and bias transform parts for the resultant neural network feature vector and some standard speaker-specific feature vector obtained for a GMM-based acoustic model using constrained Maximum Likelihood Linear Regression (“cMLLR”) techniques. Alternatively, the affine transform may be estimated by minimizing the least squares error between the resultant transformed neural network feature and some standard speaker-specific feature obtained for a GMM-based acoustic model.

Abstract(Chinese):

公开了用于估计Log Filter-Bank Energy Space（“LFBE”空间）中的仿射变换的特征，以便将基于人造神经网络的声学模型适应于新的扬声器或环境。可以使用连接的LFBE作为输入特征来训练基于神经网络的声学模型。仿射变换可以通过最小化用于所得到的神经网络特征向量的相应线性偏置变换部分和偏置变换部分之间的最小二乘误差来估计，以及使用约束最大似然线性回归（“ cMLLR“）技术。或者，可以通过最小化所得到的经变换的神经网络特征与为基于GMM的声学模型获得的某些标准的说话者特有特征之间的最小二乘误差来估计仿射变换。

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/08	.语音分类或检索
G10L15/16	..利用人工神经网络