-
公开(公告)号:US20240386002A1
公开(公告)日:2024-11-21
申请号:US18319748
申请日:2023-05-18
Applicant: Adobe Inc.
Inventor: Raunak Shah , Koyel MUKHERJEE , Subrata MITRA , Dhruv JOSHI , Sai KARNAM , Shivam Pravin BHOSALE
IPC: G06F16/215 , G06F16/28 , G06F40/284
Abstract: A dataset comprising tables is received. Embeddings are generated for column titles of a table. Based on the embeddings, similar tables are clustered. The tables are organized into smaller clusters based on statistical similarities. Similarity scores are calculated for tables within the same cluster. A relatedness graph is created based on the similarity scores; similar tables are represented by nodes connected by edges. If the similarity score for a pair of tables exceeds a threshold, a table is deleted.
-
公开(公告)号:US20240220502A1
公开(公告)日:2024-07-04
申请号:US18092779
申请日:2023-01-03
Applicant: Adobe Inc.
Inventor: Vibhor PORWAL , Yeuk-Yin CHAN , Vidit BHATIA , Subrata MITRA , Shaddy GARG , Sergey N. KAZARIN , Sameeksha ARORA , Himanshu PANDAY , Gautam Pratap KOWSHIK , Fan DU , Anup Bandigadi RAO , Anil MALKANI
IPC: G06F16/2453 , G06F16/2458
CPC classification number: G06F16/24544 , G06F16/2462
Abstract: To retrieve information derived from a plurality of separately stored datasets, join structures are identified within the plurality of separately stored datasets. Join structures can include datasets joined by a central dataset, datasets joined by a single key, and datasets joined across a plurality of keys. Each of the join structures corresponds to a query processing schema that defines a sampling technique. When a join query is received as a SQL query, the join query identifies a portion of the plurality of separately stored datasets, from which a join structure is selected and a corresponding query processing schema is identified. The join query is reconstructed to form a reconstructed join query that comprises query processing schema instructions to derive the requested information using the sampling technique defined by the identified query processing schema.
-
公开(公告)号:US20250005075A1
公开(公告)日:2025-01-02
申请号:US18342474
申请日:2023-06-27
Applicant: Adobe Inc.
Inventor: Sachin Kumar Chauhan , Subrata MITRA , Sunav CHOUDHARY , Ramasuri NARAYANAM , Koyel MUKHERJEE , Gautam Pratap KOWSHIK
IPC: G06F16/901
Abstract: Tabular data is received. A graph is created based on the tabular data. The graph comprises nodes corresponding to key-value pairs of the tabular data. Weights are assigned to the nodes and to edges that connect the nodes. The node and edge weights are updated using a message-passing neural network (MPNN) framework. The resulting graph is sampled based on the updated weights.
-
公开(公告)号:US20240273296A1
公开(公告)日:2024-08-15
申请号:US18625884
申请日:2024-04-03
Applicant: Adobe Inc.
Inventor: Sungchul KIM , Subrata MITRA , Ruiyi Zhang , Rui Wang , Handong ZHAO , Tong YU
IPC: G06F40/295 , G06N20/00
CPC classification number: G06F40/295 , G06N20/00
Abstract: Embodiments of the technology described herein describe a machine classifier capable of continually learning new classes through a continual few-shot learning approach. A natural language processing (NLP) machine classifier may initially be trained to identify a plurality of other classes through a conventional training process. In order to learn a new class, natural-language training data for a new class is generated. The training data for the new class may be few-shot training data. The training also uses synthetic training data that represents each of the plurality of other classes. The synthetic training data may be generated through a model inversion of the original classifier. The synthetic training data and the natural-language training data are used to retrain the NLP classifier to identify text in the plurality of other classes and the new class using.
-
-
-