SYSTEMS AND METHODS RELATING TO KNOWLEDGE DISTILLATION IN NATURAL LANGUAGE PROCESSING MODELS

    公开(公告)号:US20230196024A1

    公开(公告)日:2023-06-22

    申请号:US17557245

    申请日:2021-12-21

    CPC classification number: G06F40/30 G06N5/02 G06N3/0454

    Abstract: A method for creating a student model from a teacher model for knowledge distillation. The method may include: providing the teacher model trained on a first training dataset; generating candidate student models, wherein each of the candidate student models is a model having a unique permutation of layers derived by randomly selecting one or more layers of the plurality of layers of the teacher model for removing; generating a second training dataset; for each of the candidate student models: providing the second training dataset as inputs; recording outputs generated; and based on the recorded outputs, evaluating a performance according to a predetermined model evaluation criterion; determining which of the candidate student models performed best among the candidate student models based on the predetermined model evaluation criterion; identifying a preferred candidate student model.

Patent Agency Ranking