Invention Grant
- Patent Title: Representation learning from video with spatial audio
-
Application No.: US16868805Application Date: 2020-05-07
-
Publication No.: US11308329B2Publication Date: 2022-04-19
- Inventor: Justin Salamon , Bryan Russell , Karren Yang
- Applicant: Adobe Inc.
- Applicant Address: US CA San Jose
- Assignee: Adobe Inc.
- Current Assignee: Adobe Inc.
- Current Assignee Address: US CA San Jose
- Agency: Kilpatrick Townsend & Stockton LLP
- Main IPC: G06K9/00
- IPC: G06K9/00 ; H04S7/00 ; G06K9/62

Abstract:
A computer system is trained to understand audio-visual spatial correspondence using audio-visual clips having multi-channel audio. The computer system includes an audio subnetwork, video subnetwork, and pretext subnetwork. The audio subnetwork receives the two channels of audio from the audio-visual clips, and the video subnetwork receives the video frames from the audio-visual clips. In a subset of the audio-visual clips the audio-visual spatial relationship is misaligned, causing the audio-visual spatial cues for the audio and video to be incorrect. The audio subnetwork outputs an audio feature vector for each audio-visual clip, and the video subnetwork outputs a video feature vector for each audio-visual clip. The audio and video feature vectors for each audio-visual clip are merged and provided to the pretext subnetwork, which is configured to classify the merged vector as either having a misaligned audio-visual spatial relationship or not. The subnetworks are trained based on the loss calculated from the classification.
Public/Granted literature
- US20210350135A1 REPRESENTATION LEARNING FROM VIDEO WITH SPATIAL AUDIO Public/Granted day:2021-11-11
Information query