Invention Grant
- Patent Title: Audio-visual speech separation
-
Application No.: US16761707Application Date: 2018-11-21
-
Publication No.: US11456005B2Publication Date: 2022-09-27
- Inventor: Inbar Mosseri , Michael Rubinstein , Ariel Ephrat , William Freeman , Oran Lang , Kevin William Wilson , Tali Dekel , Avinatan Hassidim
- Applicant: GOOGLE LLC
- Applicant Address: US CA Mountain View
- Assignee: GOOGLE LLC
- Current Assignee: GOOGLE LLC
- Current Assignee Address: US CA Mountain View
- Agency: Fish & Richardson P.C.
- International Application: PCT/US2018/062330 WO 20181121
- International Announcement: WO2019/104229 WO 20190531
- Main IPC: G10L21/10
- IPC: G10L21/10 ; G06K9/62 ; G10L15/16 ; G10L21/18 ; G06V20/40 ; G06V40/16

Abstract:
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: obtaining, for each frame in a stream of frames from a video in which faces of one or more speakers have been detected, a respective per-frame face embedding of the face of each speaker; processing, for each speaker, the per-frame face embeddings of the face of the speaker to generate visual features for the face of the speaker; obtaining a spectrogram of an audio soundtrack for the video; processing the spectrogram to generate an audio embedding for the audio soundtrack; combining the visual features for the one or more speakers and the audio embedding for the audio soundtrack to generate an audio-visual embedding for the video; determining a respective spectrogram mask for each of the one or more speakers; and determining a respective isolated speech spectrogram for each speaker.
Public/Granted literature
- US20200335121A1 AUDIO-VISUAL SPEECH SEPARATION Public/Granted day:2020-10-22
Information query