DEVELOPING A PROGRAMMING LANGUAGE MODEL FOR MACHINE LEARNING TASKS

    公开(公告)号:US20250130780A1

    公开(公告)日:2025-04-24

    申请号:US18382018

    申请日:2023-10-19

    Abstract: A method develops a programming language model for machine learning tasks. The method includes adjusting a token list to include a language token used by a tokenizer for a pretrained language model. The pretrained language model includes a set of layers. The set of layers includes a set of initial layers, an embedding layer, and an output layer. The method further includes performing an output layer modification of the output layer to replace the output vector with the embedding vector. The method further includes freezing the set of initial layers to generate a set of frozen layers of the pretrained language model that do not update during training. The method further includes training the pretrained language model using the language token, the output layer modification, and the set of frozen layers to form a fine-tuned model from the pretrained language model.

    BINARY DETECTION IN SOFTWARE
    2.
    发明申请

    公开(公告)号:US20250110854A1

    公开(公告)日:2025-04-03

    申请号:US18476608

    申请日:2023-09-28

    Abstract: A method includes disassembling a reference binary of a library to generate a control flow graph of the referenced binary, normalizing the control flow graph to generate a normalized graph, traversing the normalized graph to generate execution traces from the normalized graph, and generating library vector embeddings. Generating library vector embeddings includes, for each execution trace of at least a subset of the execution traces, processing the execution trace by a vector embedding model to generate a library vector embedding of the execution trace. The method further includes relating, in storage, a library identifier of the library to the plurality of library vector embeddings as a fingerprint of the library.

Patent Agency Ranking