Invention Grant
- Patent Title: Joint heterogeneous language-vision embeddings for video tagging and search
-
Application No.: US15620232Application Date: 2017-06-12
-
Publication No.: US11409791B2Publication Date: 2022-08-09
- Inventor: Atousa Torabi , Leonid Sigal
- Applicant: Disney Enterprises, Inc.
- Applicant Address: US CA Burbank
- Assignee: Disney Enterprises, Inc.
- Current Assignee: Disney Enterprises, Inc.
- Current Assignee Address: US CA Burbank
- Agency: Patterson + Sheridan, LLP
- Main IPC: G06F16/638
- IPC: G06F16/638 ; G06N3/08 ; G06N3/04 ; H04N21/8405 ; G06F16/783 ; G06V20/40

Abstract:
Systems, methods and articles of manufacture for modeling a joint language-visual space. A textual query to be evaluated relative to a video library is received from a requesting entity. The video library contains a plurality of instances of video content. One or more instances of video content from the video library that correspond to the textual query are determined, by analyzing the textual query using a data model that includes a soft-attention neural network module that is jointly trained with a language Long Short-term Memory (LSTM) neural network module and a video LSTM neural network module. At least an indication of the one or more instances of video content is returned to the requesting entity.
Public/Granted literature
- US20170357720A1 JOINT HETEROGENEOUS LANGUAGE-VISION EMBEDDINGS FOR VIDEO TAGGING AND SEARCH Public/Granted day:2017-12-14
Information query