Single-channel and multi-channel source separation enhanced by lip motion

Invention Grant

US11823699B2 Single-channel and multi-channel source separation enhanced by lip motion 有权

Please log in to see more content

Patent Title: Single-channel and multi-channel source separation enhanced by lip motion
Application No.: US17751428

Application Date: 2022-05-23
Publication No.: US11823699B2

Publication Date: 2023-11-21
Inventor: Yun Li
Applicant: Alibaba Group Holding Limited
Applicant Address: KY George Town
Assignee: Alibaba Group Holding Limited
Current Assignee: Alibaba Group Holding Limited
Current Assignee Address: KY George Town
Agency: Lee & Hayes, P.C.
Main IPC: G06T7/20
IPC: G06T7/20 ; G06K9/00 ; G06K9/62 ; G06N20/10 ; G06N20/20 ; G10L21/028 ; G06V20/40 ; G06V40/16 ; G06F18/25 ; G06V10/764 ; G06V10/80 ; G06V10/82

Single-channel and multi-channel source separation enhanced by lip motion

Abstract:

Methods and systems are provided for implementing source separation techniques, and more specifically performing source separation on mixed source single-channel and multi-channel audio signals enhanced by inputting lip motion information from captured image data, including selecting a target speaker facial image from a plurality of facial images captured over a period of interest; computing a motion vector based on facial features of the target speaker facial image; and separating, based on at least the motion vector, audio corresponding to a constituent source from a mixed source audio signal captured over the period of interest. The mixed source audio signal may be captured from single-channel or multi-channel audio capture devices. Separating audio from the audio signal may be performed by a fusion learning model comprising a plurality of learning sub-models. Separating the audio from the audio signal may be performed by a blind source separation (“BSS”) learning model.

Public/Granted literature

US20220284594A1 SINGLE-CHANNEL AND MULTI-CHANNEL SOURCE SEPARATION ENHANCED BY LIP MOTION Public/Granted day:2022-09-08

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06T	一般的图像数据处理或产生
G06T7/00	图像分析
G06T7/20	.运动分析（数字视频信号的编解码或解压缩的运动估计入H04N19/43,H04N19/51）