Invention Grant
- Patent Title: Joint acoustic and visual processing
-
Application No.: US15623682Application Date: 2017-06-15
-
Publication No.: US10515292B2Publication Date: 2019-12-24
- Inventor: David F. Harwath , James R. Glass
- Applicant: MASSACHUSETTS INSTITUTE OF TECHNOLOGY
- Applicant Address: US MA Cambridge
- Assignee: Massachusetts Institute of Technology
- Current Assignee: Massachusetts Institute of Technology
- Current Assignee Address: US MA Cambridge
- Agency: Occhiuti & Rohlicek LLP
- Main IPC: G06K9/62
- IPC: G06K9/62 ; G06N3/08 ; G06F3/16 ; G10L15/18 ; G10L25/54 ; G06K9/00 ; G06N3/04 ; G10L25/30

Abstract:
An approach to joint acoustic and visual processing associates images with corresponding audio signals, for example, for the retrievals of images according to voice queries. A set of paired images and audio signals are processed without requiring transcription, segmentation, or annotation of either the images or the audio. This processing of the paired images and audio is used to determine parameters of an image processor and an audio processor, with the outputs of these processors being comparable to determine a similarity across acoustic and visual modalities. In some implementations, the image processor and the audio processor make use of deep neural networks. Further embodiments associate parts of images with corresponding parts of audio signals.
Public/Granted literature
- US20180039859A1 JOINT ACOUSTIC AND VISUAL PROCESSING Public/Granted day:2018-02-08
Information query