RELATING DATA IN DATA LAKES
    1.
    发明申请

    公开(公告)号:US20240386002A1

    公开(公告)日:2024-11-21

    申请号:US18319748

    申请日:2023-05-18

    Applicant: Adobe Inc.

    Abstract: A dataset comprising tables is received. Embeddings are generated for column titles of a table. Based on the embeddings, similar tables are clustered. The tables are organized into smaller clusters based on statistical similarities. Similarity scores are calculated for tables within the same cluster. A relatedness graph is created based on the similarity scores; similar tables are represented by nodes connected by edges. If the similarity score for a pair of tables exceeds a threshold, a table is deleted.

    TEACHING A MACHINE CLASSIFIER TO RECOGNIZE A NEW CLASS

    公开(公告)号:US20240273296A1

    公开(公告)日:2024-08-15

    申请号:US18625884

    申请日:2024-04-03

    Applicant: Adobe Inc.

    CPC classification number: G06F40/295 G06N20/00

    Abstract: Embodiments of the technology described herein describe a machine classifier capable of continually learning new classes through a continual few-shot learning approach. A natural language processing (NLP) machine classifier may initially be trained to identify a plurality of other classes through a conventional training process. In order to learn a new class, natural-language training data for a new class is generated. The training data for the new class may be few-shot training data. The training also uses synthetic training data that represents each of the plurality of other classes. The synthetic training data may be generated through a model inversion of the original classifier. The synthetic training data and the natural-language training data are used to retrain the NLP classifier to identify text in the plurality of other classes and the new class using.

Patent Agency Ranking