- Patent Title: Using intrinsic multimodal features of image for domain generalized
-
Application No.: US17976541Application Date: 2022-10-28
-
Publication No.: US12340571B2Publication Date: 2025-06-24
- Inventor: Puneet Mangla , Milan Aggarwal , Balaji Krishnamurthy
- Applicant: ADOBE INC.
- Applicant Address: US CA San Jose
- Assignee: ADOBE INC.
- Current Assignee: ADOBE INC.
- Current Assignee Address: US CA San Jose
- Agency: Shook, Hardy & Bacon L.L.P.
- Agent Joseph W. Cruz
- Main IPC: G06V10/80
- IPC: G06V10/80 ; G06F40/40 ; G06V10/764 ; G06V10/77 ; G06V10/774 ; G06V10/82 ; G06V10/86

Abstract:
Various embodiments classify one or more portions of an image based on deriving an “intrinsic” modality. Such intrinsic modality acts as a substitute to a “text” modality in a multi-modal network. A text modality in image processing is typically a natural language text that describes one or more portions of an image. However, explicit natural language text may not be available across one or more domains for training a multi-modal network. Accordingly, various embodiments described herein generate an intrinsic modality, which is also a description of one or more portions of an image, except that such description is not an explicit natural language description, but rather a machine learning model representation. Some embodiments additionally leverage a visual modality obtained from a vision-only model or branch, which may learn domain characteristics that are not present in the multi-modal network. Some embodiments additionally fuse or integrate the intrinsic modality with the visual modality for better generalization.
Public/Granted literature
- US20240153258A1 USING INTRINSIC MULTIMODAL FEATURES OF IMAGE FOR DOMAIN GENERALIZED Public/Granted day:2024-05-09
Information query