-
1.
公开(公告)号:US20220165171A1
公开(公告)日:2022-05-26
申请号:US17535675
申请日:2021-11-25
Inventor: Xing XU , Jingran ZHANG , Fumin SHEN , Jie SHAO , Hengtao SHEN
Abstract: The disclosure provides a method for enhancing audio-visual association by adopting self-supervised curriculum learning. With the help of contrastive learning, the method can train the visual and audio model without human annotation and extracts meaningful visual and audio representations for a variety of downstream tasks in the context of a teacher-student network paradigm. Specifically, a two-stage self-supervised curriculum learning scheme is proposed to contrast the visual and audio pairs and overcome the difficulty of transferring between visual and audio information in the teacher-student framework. Moreover, the knowledge shared between audio and visual modality serves as a supervisory signal for contrastive learning. In summary, with the large-scale unlabeled data, the method can obtain a visual and an audio convolution encoder. The encoders are helpful for downstream tasks and cover the training shortage causing by limited data.