System and method for enhancing machine learning model for audio/video understanding using gated multi-level attention and temporal adversarial training

Invention Grant

US11989939B2 System and method for enhancing machine learning model for audio/video understanding using gated multi-level attention and temporal adversarial training 有权

Please log in to see more content

Patent Title: System and method for enhancing machine learning model for audio/video understanding using gated multi-level attention and temporal adversarial training
Application No.: US17387889

Application Date: 2021-07-28
Publication No.: US11989939B2

Publication Date: 2024-05-21
Inventor: Saurabh Sahu , Palash Goyal
Applicant: Samsung Electronics Co., Ltd.
Applicant Address: KR Suwon-si
Assignee: Samsung Electronics Co., Ltd.
Current Assignee: Samsung Electronics Co., Ltd.
Current Assignee Address: KR Suwon-si
Main IPC: G06V20/40
IPC: G06V20/40 ; G06F18/214

System and method for enhancing machine learning model for audio/video understanding using gated multi-level attention and temporal adversarial training

Abstract:

A method includes obtaining, using at least one processor, audio/video content. The method also includes processing, using the at least one processor, the audio/video content with a trained attention-based machine learning model to classify the audio/video content. Processing the audio/video content includes, using the trained attention-based machine learning model, generating a global representation of the audio/video content based on the audio/video content, generating a local representation of the audio/video content based on different portions of the audio/video content, and combining the global representation of the audio/video content and the local representation of the audio/video content to generate an output representation of the audio/video content. The audio/video content is classified based on the output representation.

Public/Granted literature

US20220300740A1 SYSTEM AND METHOD FOR ENHANCING MACHINE LEARNING MODEL FOR AUDIO/VIDEO UNDERSTANDING USING GATED MULTI-LEVEL ATTENTION AND TEMPORAL ADVERSARIAL TRAINING Public/Granted day:2022-09-22

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06V	图像或视频识别或理解
G06V20/00	场景；特定场景元素（控制数码相机 H04N5/232）
G06V20/40	.在视频内容中（提取叠加文本 G06V20/62）（视频检索 G06F16/70）（在视频服务器中处理视频基本流H04N21/234）（在视频客户端中处理视频基本流H04N21/44）