-
公开(公告)号:US20210303866A1
公开(公告)日:2021-09-30
申请号:US17038311
申请日:2020-09-30
Applicant: Hefei University of Technology
Inventor: Yanxiang CHEN , Huadong TAN , Pengcheng ZHAO , Guang WU
Abstract: A method, a system and an electronic device for processing audio-visual data. In the method, a first dataset is obtained, where the first dataset includes several data pairs, and each of the data pairs in the first dataset includes a video frame and an audio clip that match each other. A multi-channel feature extraction network model is established to extract the visual features of each video frame and the auditory features of each audio clip in the first dataset. A contrastive loss function model is established using the extracted visual features and the auditory features to train the multi-channel feature extraction network. A classifier is established to determine whether an input audio-visual data pair is matched.