Invention Grant
- Patent Title: Systems and methods for unified vision-language understanding and generation
-
Application No.: US17745634Application Date: 2022-05-16
-
Publication No.: US12288380B2Publication Date: 2025-04-29
- Inventor: Junnan Li , Chu Hong Hoi
- Applicant: Salesforce, Inc.
- Applicant Address: US CA San Francisco
- Assignee: Salesforce, Inc.
- Current Assignee: Salesforce, Inc.
- Current Assignee Address: US CA San Francisco
- Agency: Haynes and Boone, LLP
- Main IPC: G06V10/774
- IPC: G06V10/774 ; G06F40/126 ; G06F40/284 ; G06T9/00 ; G06V10/764 ; G06V10/80

Abstract:
Embodiments described herein provide systems, methods, and devices for generating enhanced vison-language training data. A method may include: receiving, from a communication interface, a first training dataset of image-text pairs and a second training dataset of annotated image-text pairs; fine-tuning an image-grounded text decoder and an image-grounded text encoder using the second training dataset of annotated image-text pairs; generating, by the fine-tuned image-grounded text decoder, a predicted text based on a training image from the first training dataset; generating, by the fine-tuned image-grounded text encoder, a filtering decision based on the training image and the predicted text; adding the training image and the predicted text to form a third training dataset of image-text pairs depending on the filter decision; and training a vision-language model using the third training dataset of image-text pairs.
Public/Granted literature
- US20230237773A1 SYSTEMS AND METHODS FOR UNIFIED VISION-LANGUAGE UNDERSTANDING AND GENERATION Public/Granted day:2023-07-27
Information query