Systems and methods for unified vision-language understanding and generation

Invention Grant

US12288380B2 Systems and methods for unified vision-language understanding and generation 有权

Please log in to see more content

Patent Title: Systems and methods for unified vision-language understanding and generation
Application No.: US17745634

Application Date: 2022-05-16
Publication No.: US12288380B2

Publication Date: 2025-04-29
Inventor: Junnan Li , Chu Hong Hoi
Applicant: Salesforce, Inc.
Applicant Address: US CA San Francisco
Assignee: Salesforce, Inc.
Current Assignee: Salesforce, Inc.
Current Assignee Address: US CA San Francisco
Agency: Haynes and Boone, LLP
Main IPC: G06V10/774
IPC: G06V10/774 ; G06F40/126 ; G06F40/284 ; G06T9/00 ; G06V10/764 ; G06V10/80

Systems and methods for unified vision-language understanding and generation

Abstract:

Embodiments described herein provide systems, methods, and devices for generating enhanced vison-language training data. A method may include: receiving, from a communication interface, a first training dataset of image-text pairs and a second training dataset of annotated image-text pairs; fine-tuning an image-grounded text decoder and an image-grounded text encoder using the second training dataset of annotated image-text pairs; generating, by the fine-tuned image-grounded text decoder, a predicted text based on a training image from the first training dataset; generating, by the fine-tuned image-grounded text encoder, a filtering decision based on the training image and the predicted text; adding the training image and the predicted text to form a third training dataset of image-text pairs depending on the filter decision; and training a vision-language model using the third training dataset of image-text pairs.

Public/Granted literature

US20230237773A1 SYSTEMS AND METHODS FOR UNIFIED VISION-LANGUAGE UNDERSTANDING AND GENERATION Public/Granted day:2023-07-27

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06V	图像或视频识别或理解
G06V10/00	图像或视频识别或理解的安排（图像或视频中的字符识别 G06V30/10）
G06V10/70	.使用模式识别或机器学习（光学模式识别或电子计算 G06V10/88）
G06V10/77	..处理特征空间中的图像或视频特征；使用数据集成或数据缩减，例如主成分分析 [PCA] 或独立成分分析 [ICA] 或自组织图 [SOM]；盲源分离
G06V10/774	...生成训练模式集；引导方法，例如捕获或促进