Target speaker separation system, device and storage medium

Invention Grant

US11978470B2 Target speaker separation system, device and storage medium 有权

Please log in to see more content

Patent Title: Target speaker separation system, device and storage medium
Application No.: US17980473

Application Date: 2022-11-03
Publication No.: US11978470B2

Publication Date: 2024-05-07
Inventor: Jiaming Xu , Jian Cui , Bo Xu
Applicant: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES
Applicant Address: CN Beijing
Assignee: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES
Current Assignee: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES
Current Assignee Address: CN Beijing
Agency: Westbridge IP LLC
Priority: CN 2210602186.2 2022.05.30
Main IPC: G10L21/0272
IPC: G10L21/0272 ; G10L17/02 ; G10L17/04 ; G10L17/06 ; G10L21/028 ; H04S1/00

Target speaker separation system, device and storage medium

Abstract:

Disclosed are a target speaker separation system, an electronic device and a storage medium. The system includes: first, performing, jointly unified modeling on a plurality of cues based a masked pre-training strategy, to boost the inference capability of a model for missing cues and enhance the representation accuracy of disturbed cues; and second, constructing a hierarchical cue modulation module. A spatial cue is introduced into a primary cue modulation module for directional enhancement of a speech of a speaker; in an intermediate cue modulation module, the speech of the speaker is enhanced on the basis of temporal coherence of a dynamic cue and an auditory signal component; a steady-state cue is introduced into an advanced cue modulation module for selective filtering; and finally, the supervised learning capability of simulation data and the unsupervised learning effect of real mixed data are sufficiently utilized.

Public/Granted literature

US20240005941A1 TARGET SPEAKER SEPARATION SYSTEM, DEVICE AND STORAGE MEDIUM Public/Granted day:2024-01-04

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L21/00	为了改变语音或声音信号的质量或其可识度而处理语音或声音信号，以产生另一种可听的或非可听的信号，例如视觉信号或触觉信号（G10L19/00优先）
G10L21/02	.语音增强，例如降低噪声或消除回声（在直线传送系统中减轻回声效应入H04B3/20；免提电话中的回声抑制入H04M9/08）
G10L21/0272	..声音信号的分离