Invention Grant
- Patent Title: Single-channel and multi-channel source separation enhanced by lip motion
-
Application No.: US17751428Application Date: 2022-05-23
-
Publication No.: US11823699B2Publication Date: 2023-11-21
- Inventor: Yun Li
- Applicant: Alibaba Group Holding Limited
- Applicant Address: KY George Town
- Assignee: Alibaba Group Holding Limited
- Current Assignee: Alibaba Group Holding Limited
- Current Assignee Address: KY George Town
- Agency: Lee & Hayes, P.C.
- Main IPC: G06T7/20
- IPC: G06T7/20 ; G06K9/00 ; G06K9/62 ; G06N20/10 ; G06N20/20 ; G10L21/028 ; G06V20/40 ; G06V40/16 ; G06F18/25 ; G06V10/764 ; G06V10/80 ; G06V10/82

Abstract:
Methods and systems are provided for implementing source separation techniques, and more specifically performing source separation on mixed source single-channel and multi-channel audio signals enhanced by inputting lip motion information from captured image data, including selecting a target speaker facial image from a plurality of facial images captured over a period of interest; computing a motion vector based on facial features of the target speaker facial image; and separating, based on at least the motion vector, audio corresponding to a constituent source from a mixed source audio signal captured over the period of interest. The mixed source audio signal may be captured from single-channel or multi-channel audio capture devices. Separating audio from the audio signal may be performed by a fusion learning model comprising a plurality of learning sub-models. Separating the audio from the audio signal may be performed by a blind source separation (“BSS”) learning model.
Public/Granted literature
- US20220284594A1 SINGLE-CHANNEL AND MULTI-CHANNEL SOURCE SEPARATION ENHANCED BY LIP MOTION Public/Granted day:2022-09-08
Information query