-
公开(公告)号:US20230401827A1
公开(公告)日:2023-12-14
申请号:US17806097
申请日:2022-06-09
Applicant: ADOBE INC.
Inventor: Jason Wen Yong Kuen , Dat Ba Huynh , Zhe Lin , Jiuxiang Gu
IPC: G06V10/774 , G06V10/26 , G06V10/75 , G06V10/77 , G06V10/776 , G06V10/82
CPC classification number: G06V10/774 , G06V10/26 , G06V10/759 , G06V10/7715 , G06V10/776 , G06V10/82
Abstract: Systems and methods for image segmentation are described. Embodiments of the present disclosure receive a training image and a caption for the training image, wherein the caption includes text describing an object in the training image; generate a pseudo mask for the object using a teacher network based on the text describing the object; generate a mask for the object using a student network; compute noise information for the training image using a noise estimation network; and update parameters of the student network based on the mask, the pseudo mask, and the noise information.
-
公开(公告)号:US11816243B2
公开(公告)日:2023-11-14
申请号:US17397407
申请日:2021-08-09
Applicant: Adobe Inc.
Inventor: Thi Kim Phung Lai , Tong Sun , Rajiv Jain , Nikolaos Barmpalios , Jiuxiang Gu , Franck Dernoncourt
IPC: G06F21/62 , G06N20/00 , G06F40/295
CPC classification number: G06F21/6245 , G06F40/295 , G06N20/00
Abstract: Systems, methods, and non-transitory computer-readable media can generate a natural language model that provides user-entity differential privacy. For example, in one or more embodiments, a system samples sensitive data points from a natural language dataset. Using the sampled sensitive data points, the system determines gradient values corresponding to the natural language model. Further, the system generates noise for the natural language model. The system generates parameters for the natural language model using the gradient values and the noise, facilitating simultaneous protection of the users and sensitive entities associated with the natural language dataset. In some implementations, the system generates the natural language model through an iterative process (e.g., by iteratively modifying the parameters).
-
公开(公告)号:US20230154221A1
公开(公告)日:2023-05-18
申请号:US17528061
申请日:2021-11-16
Applicant: ADOBE INC.
Inventor: Jiuxiang Gu , Ani Nenkova Nenkova , Nikolaos Barmpalios , Vlad Ion Morariu , Tong Sun , Rajiv Bhawanji Jain , Jason wen yong Kuen , Handong Zhao
CPC classification number: G06K9/00463 , G06K9/6256 , G06K9/629 , G06F40/30 , G06N20/00 , G06N3/02
Abstract: The technology described includes methods for pretraining a document encoder model based on multimodal self cross-attention. One method includes receiving image data that encodes a set of pretraining documents. A set of sentences is extracted from the image data. A bounding box for each sentence is generated. For each sentence, a set of predicted features is generated by using an encoder machine-learning model. The encoder model performs cross-attention between a set of masked-textual features for the sentence and a set of masked-visual features for the sentence. The set of masked-textual features is based on a masking function and the sentence. The set of masked-visual features is based on the masking function and the corresponding bounding box. A document-encoder model is pretrained based on the set of predicted features for each sentence and pretraining tasks. The pretraining tasks includes masked sentence modeling, visual contrastive learning, or visual-language alignment.
-
公开(公告)号:US20220147838A1
公开(公告)日:2022-05-12
申请号:US17093185
申请日:2020-11-09
Applicant: Adobe Inc.
Inventor: Jiuxiang Gu , Vlad Ion Morariu , Tong Sun , Jason wen yong Kuen , Handong Zhao
Abstract: Methods and systems disclosed herein relate generally to systems and methods for generating visual relationship graphs that identify relationships between objects depicted in an image. A vision-language application uses transformer encoders to generate a graph structure, in which the graph structure represents a dependency between a first region and a second region of an image. The dependency indicates that a contextual representation of the first region was derived, at least in part, by processing the second region. The contextual representation identifies a predicted identity of an image object depicted in the first region. The predicted identity is determined at least in part by identifying a relationship between the first region and other data objects associated with various modalities.
-
-
-