Deep multi-channel acoustic modeling using multiple microphone array geometries

Invention Grant

US11574628B1 Deep multi-channel acoustic modeling using multiple microphone array geometries 有权

Please log in to see more content

Patent Title: Deep multi-channel acoustic modeling using multiple microphone array geometries
Application No.: US16368331

Application Date: 2019-03-28
Publication No.: US11574628B1

Publication Date: 2023-02-07
Inventor: Kenichi Kumatani , Minhua Wu , Shiva Sundaram , Nikko Strom , Bjorn Hoffmeister
Applicant: Amazon Technologies, Inc.
Applicant Address: US WA Seattle
Assignee: Amazon Technologies, Inc.
Current Assignee: Amazon Technologies, Inc.
Current Assignee Address: US WA Seattle
Agency: Pierce Atwood LLP
Main IPC: G10L15/16
IPC: G10L15/16 ; G10L25/30 ; G10L15/02 ; G06N3/08

Deep multi-channel acoustic modeling using multiple microphone array geometries

Abstract:

Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-geometry/multi-channel DNN) that is trained using a plurality of microphone array geometries. Thus, the first model may receive a variable number of microphone channels, generate multiple outputs using multiple microphone array geometries, and select the best output as a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. The DNN front-end enables improved performance despite a reduction in microphones.

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/08	.语音分类或检索
G10L15/16	..利用人工神经网络