Separating speech by source in audio recordings by predicting isolated audio signals conditioned on speaker representations

Invention Grant

US11475909B2 Separating speech by source in audio recordings by predicting isolated audio signals conditioned on speaker representations 有权

Please log in to see more content

Patent Title: Separating speech by source in audio recordings by predicting isolated audio signals conditioned on speaker representations
Application No.: US17170657

Application Date: 2021-02-08
Publication No.: US11475909B2

Publication Date: 2022-10-18
Inventor: Neil Zeghidour , David Grangier
Applicant: Google LLC
Applicant Address: US CA Mountain View
Assignee: Google LLC
Current Assignee: Google LLC
Current Assignee Address: US CA Mountain View
Agency: Fish & Richardson P.C.
Main IPC: G10L21/028
IPC: G10L21/028 ; G10L21/0316 ; G10L17/04 ; G10L17/18 ; G06N3/04 ; G06N3/08

Separating speech by source in audio recordings by predicting isolated audio signals conditioned on speaker representations

Abstract:

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing speech separation. One of the methods includes obtaining a recording comprising speech from a plurality of speakers; processing the recording using a speaker neural network having speaker parameter values and configured to process the recording in accordance with the speaker parameter values to generate a plurality of per-recording speaker representations, each speaker representation representing features of a respective identified speaker in the recording; and processing the per-recording speaker representations and the recording using a separation neural network having separation parameter values and configured to process the recording and the speaker representations in accordance with the separation parameter values to generate, for each speaker representation, a respective predicted isolated audio signal that corresponds to speech of one of the speakers in the recording.

Public/Granted literature

US20210249027A1 SEPARATING SPEECH BY SOURCE IN AUDIO RECORDINGS BY PREDICTING ISOLATED AUDIO SIGNALS CONDITIONED ON SPEAKER REPRESENTATIONS Public/Granted day:2021-08-12

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L21/00	为了改变语音或声音信号的质量或其可识度而处理语音或声音信号，以产生另一种可听的或非可听的信号，例如视觉信号或触觉信号（G10L19/00优先）
G10L21/02	.语音增强，例如降低噪声或消除回声（在直线传送系统中减轻回声效应入H04B3/20；免提电话中的回声抑制入H04M9/08）
G10L21/0272	..声音信号的分离
G10L21/028	...采用声源的属性