Using intrinsic multimodal features of image for domain generalized

Invention Grant

US12340571B2 Using intrinsic multimodal features of image for domain generalized 有权

Please log in to see more content

Patent Title: Using intrinsic multimodal features of image for domain generalized
Application No.: US17976541

Application Date: 2022-10-28
Publication No.: US12340571B2

Publication Date: 2025-06-24
Inventor: Puneet Mangla , Milan Aggarwal , Balaji Krishnamurthy
Applicant: ADOBE INC.
Applicant Address: US CA San Jose
Assignee: ADOBE INC.
Current Assignee: ADOBE INC.
Current Assignee Address: US CA San Jose
Agency: Shook, Hardy & Bacon L.L.P.
Agent Joseph W. Cruz
Main IPC: G06V10/80
IPC: G06V10/80 ; G06F40/40 ; G06V10/764 ; G06V10/77 ; G06V10/774 ; G06V10/82 ; G06V10/86

Using intrinsic multimodal features of image for domain generalized

Abstract:

Various embodiments classify one or more portions of an image based on deriving an “intrinsic” modality. Such intrinsic modality acts as a substitute to a “text” modality in a multi-modal network. A text modality in image processing is typically a natural language text that describes one or more portions of an image. However, explicit natural language text may not be available across one or more domains for training a multi-modal network. Accordingly, various embodiments described herein generate an intrinsic modality, which is also a description of one or more portions of an image, except that such description is not an explicit natural language description, but rather a machine learning model representation. Some embodiments additionally leverage a visual modality obtained from a vision-only model or branch, which may learn domain characteristics that are not present in the multi-modal network. Some embodiments additionally fuse or integrate the intrinsic modality with the visual modality for better generalization.

Public/Granted literature

US20240153258A1 USING INTRINSIC MULTIMODAL FEATURES OF IMAGE FOR DOMAIN GENERALIZED Public/Granted day:2024-05-09

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06V	图像或视频识别或理解
G06V10/00	图像或视频识别或理解的安排（图像或视频中的字符识别 G06V30/10）
G06V10/70	.使用模式识别或机器学习（光学模式识别或电子计算 G06V10/88）
G06V10/77	..处理特征空间中的图像或视频特征；使用数据集成或数据缩减，例如主成分分析 [PCA] 或独立成分分析 [ICA] 或自组织图 [SOM]；盲源分离
G06V10/80	...融合，即在传感器级别、预处理级别、特征提取级别或分类级别融合来自各种来源的数据（多模态讲话者的识别或验证 G10L17/10）