Targeted voice separation by speaker conditioned on spectrogram masking

Invention Grant

US11922951B2 Targeted voice separation by speaker conditioned on spectrogram masking 有权

Please log in to see more content

Patent Title: Targeted voice separation by speaker conditioned on spectrogram masking
Application No.: US17567590

Application Date: 2022-01-03
Publication No.: US11922951B2

Publication Date: 2024-03-05
Inventor: Quan Wang , Prashant Sridhar , Ignacio Lopez Moreno , Hannah Muckenhirn
Applicant: GOOGLE LLC
Applicant Address: US CA Mountain View
Assignee: GOOGLE LLC
Current Assignee: GOOGLE LLC
Current Assignee Address: US CA Mountain View
Agency: Gray Ice Higdon
Main IPC: G10L17/04
IPC: G10L17/04 ; G10L17/00 ; G10L17/02 ; G10L17/18 ; G10L17/22 ; G10L25/18

Targeted voice separation by speaker conditioned on spectrogram masking

Abstract:

Techniques are disclosed that enable processing of audio data to generate one or more refined versions of audio data, where each of the refined versions of audio data isolate one or more utterances of a single respective human speaker. Various implementations generate a refined version of audio data that isolates utterance(s) of a single human speaker by processing a spectrogram representation of the audio data (generated by processing the audio data with a frequency transformation) using a mask generated by processing the spectrogram of the audio data and a speaker embedding for the single human speaker using a trained voice filter model. Output generated over the trained voice filter model is processed using an inverse of the frequency transformation to generate the refined audio data.

Public/Granted literature

US20220122611A1 TARGETED VOICE SEPARATION BY SPEAKER CONDITIONED ON SPECTROGRAM MASKING Public/Granted day:2022-04-21

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L17/00	讲话者辨认或验证
G10L17/04	.训练，登记或模型的建立