Validation-based determination of computational models

    公开(公告)号:US11811821B2

    公开(公告)日:2023-11-07

    申请号:US17087194

    申请日:2020-11-02

    CPC classification number: H04L63/145 G06F21/56 G06N20/00 H04L63/1416

    Abstract: Example techniques described herein determine a validation dataset, determine a computational model using the validation dataset, or determine a signature or classification of a data stream such as a file. The classification can indicate whether the data stream is associated with malware. A processing unit can determine signatures of individual training data streams. The processing unit can determine, based at least in part on the signatures and a predetermined difference criterion, a training set and a validation set of the training data streams. The processing unit can determine a computational model based at least in part on the training set. The processing unit can then operate the computational model based at least in part on a trial data stream to provide a trial model output. Some examples include determining the validation set based at least in part on the training set and the predetermined criterion for difference between data streams.

    Computational modeling and classification of data streams

    公开(公告)号:US10832168B2

    公开(公告)日:2020-11-10

    申请号:US15402524

    申请日:2017-01-10

    Abstract: Example techniques described herein determine a signature or classification of a data stream such as a file. The classification can indicate whether the data stream is associated with malware. A processor can locate training analysis regions of training data streams based on predetermined structure data, and determining training model inputs based on the training analysis regions. The processor can determine a computational model based on the training model inputs. The computational model can receive an input vector and provide a corresponding feature vector. The processor can then locate a trial analysis region of a trial data stream based on the predetermined structure data and determine a trial model input. The processor can operate the computational model based on the trial model input to provide a trial feature vector, e.g., a signature. The processor can operate a second computational model to provide a classification based on the signature.

    EMBEDDING NETWORKS TO EXTRACT MALWARE FAMILY INFORMATION

    公开(公告)号:US20210256401A1

    公开(公告)日:2021-08-19

    申请号:US17177220

    申请日:2021-02-17

    Inventor: David Elkind

    Abstract: Methods and systems are provided for training a machine learning model to embed feature vectors in a feature space which magnifies distances between discriminating features of different malware families. In a labeled family dataset, labeled features which discriminate between different families are embedded in a feature space on a triplet loss function. Training may be performed in phases, starting by excluding hardest-positive and hardest-negative data points to provide reliable feature embeddings for initializing subsequent, more difficult phases. By training an embedding learning model to distinguish labeled malware families apart from training a classification learning model, the trained feature embedding may boost performance of classification learning models with regard to novel malware families which can only be distinguished by novel features. Consequently, these techniques enable enhanced performance of classification of novel malware families, which may further be provided as a service on a cloud computing system.

    MALWARE DETECTION USING LOCAL COMPUTATIONAL MODELS

    公开(公告)号:US20190026466A1

    公开(公告)日:2019-01-24

    申请号:US15657379

    申请日:2017-07-24

    Abstract: Example techniques herein determine that a trial data stream is associated with malware (“dirty”) using a local computational model (CM). The data stream can be represented by a feature vector. A control unit can receive a first, dirty feature vector (e.g., a false miss) and determine the local CM based on the first feature vector. The control unit can receive a trial feature vector representing the trial data stream. The control unit can determine that the trial data stream is dirty if a broad CM or the local CM determines that the trial feature vector is dirty. In some examples, the local CM can define a dirty region in a feature space. The control unit can determine the local CM based on the first feature vector and other clean or dirty feature vectors, e.g., a clean feature vector nearest to the first feature vector.

    Validation-based determination of computational models

    公开(公告)号:US10826934B2

    公开(公告)日:2020-11-03

    申请号:US15402503

    申请日:2017-01-10

    Abstract: Example techniques described herein determine a validation dataset, determine a computational model using the validation dataset, or determine a signature or classification of a data stream such as a file. The classification can indicate whether the data stream is associated with malware. A processing unit can determine signatures of individual training data streams. The processing unit can determine, based at least in part on the signatures and a predetermined difference criterion, a training set and a validation set of the training data streams. The processing unit can determine a computational model based at least in part on the training set. The processing unit can then operate the computational model based at least in part on a trial data stream to provide a trial model output. Some examples include determining the validation set based at least in part on the training set and the predetermined criterion for difference between data streams.

    CLASSIFICATION OF SOURCE DATA BY NEURAL NETWORK PROCESSING

    公开(公告)号:US20190273509A1

    公开(公告)日:2019-09-05

    申请号:US15909372

    申请日:2018-03-01

    Abstract: Example techniques described herein determine a classification of a variable-length source data such as an executable code. A neural network system that includes a convolution filter, a recurrent neural network, and a fully connected layer can be configured in a computing device to classify executable code. The neural network system can receive executable code of variable length and reduce its dimensionality by generating a variable-length sequence of features extracted from the executable code. The sequence of features is filtered, and applied to one or more recurrent neural networks and to a neural network. The output of the neural network classifies the data. Other disclosed systems include a system for reducing the dimensionality of command line input using a recurrent neural network. The reduced dimensionality of command line input may be classified using the disclosed neural network systems.

    Malware detection using local computational models

    公开(公告)号:US10726128B2

    公开(公告)日:2020-07-28

    申请号:US15657379

    申请日:2017-07-24

    Abstract: Example techniques herein determine that a trial data stream is associated with malware (“dirty”) using a local computational model (CM). The data stream can be represented by a feature vector. A control unit can receive a first, dirty feature vector (e.g., a false miss) and determine the local CM based on the first feature vector. The control unit can receive a trial feature vector representing the trial data stream. The control unit can determine that the trial data stream is dirty if a broad CM or the local CM determines that the trial feature vector is dirty. In some examples, the local CM can define a dirty region in a feature space. The control unit can determine the local CM based on the first feature vector and other clean or dirty feature vectors, e.g., a clean feature vector nearest to the first feature vector.

    CLASSIFICATION OF SOURCE DATA BY NEURAL NETWORK PROCESSING

    公开(公告)号:US20190273510A1

    公开(公告)日:2019-09-05

    申请号:US15909442

    申请日:2018-03-01

    Abstract: Example techniques described herein determine a classification of a variable-length source data such as an executable code. A neural network system that includes a convolution filter, a recurrent neural network, and a fully connected layer can be configured in a computing device to classify executable code. The neural network system can receive executable code of variable length and reduce its dimensionality by generating a variable-length sequence of features extracted from the executable code. The sequence of features is filtered, and applied to one or more recurrent neural networks and to a neural network. The output of the neural network classifies the data. Other disclosed systems include a system for reducing the dimensionality of command line input using a recurrent neural network. The reduced dimensionality of command line input may be classified using the disclosed neural network systems.

    COMPUTATIONAL MODELING AND CLASSIFICATION OF DATA STREAMS

    公开(公告)号:US20180197089A1

    公开(公告)日:2018-07-12

    申请号:US15402524

    申请日:2017-01-10

    Abstract: Example techniques described herein determine a signature or classification of a data stream such as a file. The classification can indicate whether the data stream is associated with malware. A processor can locate training analysis regions of training data streams based on predetermined structure data, and determining training model inputs based on the training analysis regions. The processor can determine a computational model based on the training model inputs. The computational model can receive an input vector and provide a corresponding feature vector. The processor can then locate a trial analysis region of a trial data stream based on the predetermined structure data and determine a trial model input. The processor can operate the computational model based on the trial model input to provide a trial feature vector, e.g., a signature. The processor can operate a second computational model to provide a classification based on the signature.

    VALIDATION-BASED DETERMINATION OF COMPUTATIONAL MODELS

    公开(公告)号:US20210075798A1

    公开(公告)日:2021-03-11

    申请号:US17087194

    申请日:2020-11-02

    Abstract: Example techniques described herein determine a validation dataset, determine a computational model using the validation dataset, or determine a signature or classification of a data stream such as a file. The classification can indicate whether the data stream is associated with malware. A processing unit can determine signatures of individual training data streams. The processing unit can determine, based at least in part on the signatures and a predetermined difference criterion, a training set and a validation set of the training data streams. The processing unit can determine a computational model based at least in part on the training set. The processing unit can then operate the computational model based at least in part on a trial data stream to provide a trial model output. Some examples include determining the validation set based at least in part on the training set and the predetermined criterion for difference between data streams.

Patent Agency Ranking