Leveraging unsupervised meta-learning to boost few-shot action recognition

Invention Grant

US12087043B2 Leveraging unsupervised meta-learning to boost few-shot action recognition 有权

Please log in to see more content

Patent Title: Leveraging unsupervised meta-learning to boost few-shot action recognition
Application No.: US17535517

Application Date: 2021-11-24
Publication No.: US12087043B2

Publication Date: 2024-09-10
Inventor: Gaurav Mittal , Ye Yu , Mei Chen , Jay Sanjay Patravali
Applicant: Microsoft Technology Licensing, LLC
Applicant Address: US WA Redmond
Assignee: Microsoft Technology Licensing, LLC
Current Assignee: Microsoft Technology Licensing, LLC
Current Assignee Address: US WA Redmond
Agency: Foley IP Law, PLLC
Main IPC: G06K9/00
IPC: G06K9/00 ; G06F16/73 ; G06F16/75 ; G06N20/00 ; G06V10/764 ; G06V10/774

Leveraging unsupervised meta-learning to boost few-shot action recognition

Abstract:

The disclosure herein describes preparing and using a cross-attention model for action recognition using pre-trained encoders and novel class fine-tuning. Training video data is transformed into augmented training video segments, which are used to train an appearance encoder and an action encoder. The appearance encoder is trained to encode video segments based on spatial semantics and the action encoder is trained to encode video segments based on spatio-temporal semantics. A set of hard-mined training episodes are generated using the trained encoders. The cross-attention module is then trained for action-appearance aligned classification using the hard-mined training episodes. Then, support video segments are obtained, wherein each support video segment is associated with video classes. The cross-attention module is fine-tuned using the obtained support video segments and the associated video classes. A query video segment is obtained and classified as a video class using the fine-tuned cross-attention module.

Public/Granted literature

US20230113643A1 LEVERAGING UNSUPERVISED META-LEARNING TO BOOST FEW-SHOT ACTION RECOGNITION Public/Granted day:2023-04-13

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06K	图形数据读取（图像或视频识别或理解G06V）；数据的呈现；记录载体；处理记录载体
G06K9/00	识别模式的方法或装置（图形读取或将机械参数模式（例如力或存在）转换为电信号的方法或装置 G06K11/00）（图像或视频识别或理解 G06V）（语音识别 G10L15/00 )