Video retrieval techniques using video contrastive learning
Abstract:
A method, computer system, and a computer program product are provided for training a neural network for finding queried videos. Two pairs of video clips and associated text are obtained from a first dataset and a second dataset. The first dataset is used to train two video encoders by providing the video clips to the encoders as input and providing the outputs to a cosine similarity calculator. The second dataset is used to train a multi-mentor paradigm with two mentors. A first mentor and a second mentor are each provided the pair of textual data inputs. The first mentor provides a similarity value comparison, and the second mentor provides a word mover distance. Using the output from the multi-mentor paradigm and the encoders, a contrastive loss is calculated and used to provide contrastive learning of video features by differentiating similarity and dissimilarity of the video clips.
Public/Granted literature
Information query
Patent Agency Ranking
0/0