-
公开(公告)号:US12288140B2
公开(公告)日:2025-04-29
申请号:US17150524
申请日:2021-01-15
Applicant: Microsoft Technology Licensing, LLC
Inventor: Soyoung Peraud , Alexandre Rochette , Gabriel Arien Desgarennes , Niel Chah , Abhishek Kumar , Timothy James Hazen
IPC: G06N3/09 , G06F18/22 , G06F18/23 , G06F18/2411 , G06F18/2431 , G06N3/084 , G06N20/00
Abstract: A classifier may be trained with less than all datasets manually annotated with labels. A small subset of verbatims may be manually labeled with topic labels as seeds. Data augmentations can be used to acquire seed verbatim sets for known topics and to assign temporary pseudo labels to the rest of the verbatims based on their vector space proximity to the labeled seed verbatims. The training may involve classification epochs during which embeddings are updated with the assumption that the pseudo labels are ground-truth labels. The training may also involve labeling epochs during which the updated embeddings are used to update the vectors corresponding to the verbatims, and pseudo labels are updated based on updated vector coordinates in the vector space. As the training process progresses through the epochs, the embeddings will converge.