Invention Grant
- Patent Title: Using multiple trained models to reduce data labeling efforts
-
Application No.: US18219333Application Date: 2023-07-07
-
Publication No.: US11983171B2Publication Date: 2024-05-14
- Inventor: Matthew Shreve , Francisco E. Torres , Raja Bala , Robert R. Price , Pei Li
- Applicant: Xerox Corporation
- Applicant Address: US CT Norwalk
- Assignee: Xerox Corporation
- Current Assignee: Xerox Corporation
- Current Assignee Address: US CT Norwalk
- Agency: Womble Bond Dickinson (US) LLP
- Main IPC: G06F16/00
- IPC: G06F16/00 ; G06F16/23 ; G06N20/00

Abstract:
A method of labeling a dataset includes inputting a testing set comprising a plurality of input data samples into a plurality of pre-trained machine learning models to generate a set of embeddings output by the plurality of pre-trained machine learning models. The method further includes performing an iterative cluster labeling algorithm that includes generating a plurality of clusterings from the set of embeddings, analyzing the plurality of clusterings to identify a target embedding with a highest duster quality, analyzing the target embedding to determine a compactness for each of the plurality of clusterings of the target embedding, and identifying a target cluster among the plurality of clusterings of the target embedding based on the compactness. The method further includes assigning pseudo-labels to the subset of the plurality of input data samples that are members of the target duster.
Public/Granted literature
- US20230350880A1 USING MULTIPLE TRAINED MODELS TO REDUCE DATA LABELING EFFORTS Public/Granted day:2023-11-02
Information query