-
公开(公告)号:US11811821B2
公开(公告)日:2023-11-07
申请号:US17087194
申请日:2020-11-02
Applicant: CrowdStrike, Inc.
Inventor: Sven Krasser , David Elkind , Brett Meyer , Patrick Crenshaw
CPC classification number: H04L63/145 , G06F21/56 , G06N20/00 , H04L63/1416
Abstract: Example techniques described herein determine a validation dataset, determine a computational model using the validation dataset, or determine a signature or classification of a data stream such as a file. The classification can indicate whether the data stream is associated with malware. A processing unit can determine signatures of individual training data streams. The processing unit can determine, based at least in part on the signatures and a predetermined difference criterion, a training set and a validation set of the training data streams. The processing unit can determine a computational model based at least in part on the training set. The processing unit can then operate the computational model based at least in part on a trial data stream to provide a trial model output. Some examples include determining the validation set based at least in part on the training set and the predetermined criterion for difference between data streams.
-
公开(公告)号:US10832168B2
公开(公告)日:2020-11-10
申请号:US15402524
申请日:2017-01-10
Applicant: CrowdStrike, Inc.
Inventor: Sven Krasser , David Elkind , Patrick Crenshaw , Brett Meyer
IPC: G06N99/00 , H04N21/44 , G06N3/08 , G06F9/00 , G06T5/20 , G06N20/00 , H04L12/24 , H04L29/06 , G06F21/56 , G06N3/04 , G06N20/10
Abstract: Example techniques described herein determine a signature or classification of a data stream such as a file. The classification can indicate whether the data stream is associated with malware. A processor can locate training analysis regions of training data streams based on predetermined structure data, and determining training model inputs based on the training analysis regions. The processor can determine a computational model based on the training model inputs. The computational model can receive an input vector and provide a corresponding feature vector. The processor can then locate a trial analysis region of a trial data stream based on the predetermined structure data and determine a trial model input. The processor can operate the computational model based on the trial model input to provide a trial feature vector, e.g., a signature. The processor can operate a second computational model to provide a classification based on the signature.
-
公开(公告)号:US20210256401A1
公开(公告)日:2021-08-19
申请号:US17177220
申请日:2021-02-17
Applicant: CrowdStrike, Inc.
Inventor: David Elkind
Abstract: Methods and systems are provided for training a machine learning model to embed feature vectors in a feature space which magnifies distances between discriminating features of different malware families. In a labeled family dataset, labeled features which discriminate between different families are embedded in a feature space on a triplet loss function. Training may be performed in phases, starting by excluding hardest-positive and hardest-negative data points to provide reliable feature embeddings for initializing subsequent, more difficult phases. By training an embedding learning model to distinguish labeled malware families apart from training a classification learning model, the trained feature embedding may boost performance of classification learning models with regard to novel malware families which can only be distinguished by novel features. Consequently, these techniques enable enhanced performance of classification of novel malware families, which may further be provided as a service on a cloud computing system.
-
公开(公告)号:US20190026466A1
公开(公告)日:2019-01-24
申请号:US15657379
申请日:2017-07-24
Applicant: CrowdStrike, Inc.
Inventor: Sven Krasser , David Elkind , Patrick Crenshaw , Kirby James Koster
Abstract: Example techniques herein determine that a trial data stream is associated with malware (“dirty”) using a local computational model (CM). The data stream can be represented by a feature vector. A control unit can receive a first, dirty feature vector (e.g., a false miss) and determine the local CM based on the first feature vector. The control unit can receive a trial feature vector representing the trial data stream. The control unit can determine that the trial data stream is dirty if a broad CM or the local CM determines that the trial feature vector is dirty. In some examples, the local CM can define a dirty region in a feature space. The control unit can determine the local CM based on the first feature vector and other clean or dirty feature vectors, e.g., a clean feature vector nearest to the first feature vector.
-
公开(公告)号:US10826934B2
公开(公告)日:2020-11-03
申请号:US15402503
申请日:2017-01-10
Applicant: CrowdStrike, Inc.
Inventor: Sven Krasser , David Elkind , Brett Meyer , Patrick Crenshaw
Abstract: Example techniques described herein determine a validation dataset, determine a computational model using the validation dataset, or determine a signature or classification of a data stream such as a file. The classification can indicate whether the data stream is associated with malware. A processing unit can determine signatures of individual training data streams. The processing unit can determine, based at least in part on the signatures and a predetermined difference criterion, a training set and a validation set of the training data streams. The processing unit can determine a computational model based at least in part on the training set. The processing unit can then operate the computational model based at least in part on a trial data stream to provide a trial model output. Some examples include determining the validation set based at least in part on the training set and the predetermined criterion for difference between data streams.
-
公开(公告)号:US20190273509A1
公开(公告)日:2019-09-05
申请号:US15909372
申请日:2018-03-01
Applicant: CrowdStrike, Inc.
Inventor: David Elkind , Patrick Crenshaw , Sven Krasser
Abstract: Example techniques described herein determine a classification of a variable-length source data such as an executable code. A neural network system that includes a convolution filter, a recurrent neural network, and a fully connected layer can be configured in a computing device to classify executable code. The neural network system can receive executable code of variable length and reduce its dimensionality by generating a variable-length sequence of features extracted from the executable code. The sequence of features is filtered, and applied to one or more recurrent neural networks and to a neural network. The output of the neural network classifies the data. Other disclosed systems include a system for reducing the dimensionality of command line input using a recurrent neural network. The reduced dimensionality of command line input may be classified using the disclosed neural network systems.
-
公开(公告)号:US10726128B2
公开(公告)日:2020-07-28
申请号:US15657379
申请日:2017-07-24
Applicant: CrowdStrike, Inc.
Inventor: Sven Krasser , David Elkind , Patrick Crenshaw , Kirby James Koster
Abstract: Example techniques herein determine that a trial data stream is associated with malware (“dirty”) using a local computational model (CM). The data stream can be represented by a feature vector. A control unit can receive a first, dirty feature vector (e.g., a false miss) and determine the local CM based on the first feature vector. The control unit can receive a trial feature vector representing the trial data stream. The control unit can determine that the trial data stream is dirty if a broad CM or the local CM determines that the trial feature vector is dirty. In some examples, the local CM can define a dirty region in a feature space. The control unit can determine the local CM based on the first feature vector and other clean or dirty feature vectors, e.g., a clean feature vector nearest to the first feature vector.
-
公开(公告)号:US20190273510A1
公开(公告)日:2019-09-05
申请号:US15909442
申请日:2018-03-01
Applicant: CrowdStrike, Inc.
Inventor: David Elkind , Patrick Crenshaw , Sven Krasser
Abstract: Example techniques described herein determine a classification of a variable-length source data such as an executable code. A neural network system that includes a convolution filter, a recurrent neural network, and a fully connected layer can be configured in a computing device to classify executable code. The neural network system can receive executable code of variable length and reduce its dimensionality by generating a variable-length sequence of features extracted from the executable code. The sequence of features is filtered, and applied to one or more recurrent neural networks and to a neural network. The output of the neural network classifies the data. Other disclosed systems include a system for reducing the dimensionality of command line input using a recurrent neural network. The reduced dimensionality of command line input may be classified using the disclosed neural network systems.
-
公开(公告)号:US20180197089A1
公开(公告)日:2018-07-12
申请号:US15402524
申请日:2017-01-10
Applicant: CrowdStrike, Inc.
Inventor: Sven Krasser , David Elkind , Patrick Crenshaw , Brett Meyer
CPC classification number: G06N20/00 , G06F21/56 , G06N3/0445 , G06N3/0454 , G06N3/084 , G06N20/10 , H04L41/145 , H04L41/147 , H04L63/1416
Abstract: Example techniques described herein determine a signature or classification of a data stream such as a file. The classification can indicate whether the data stream is associated with malware. A processor can locate training analysis regions of training data streams based on predetermined structure data, and determining training model inputs based on the training analysis regions. The processor can determine a computational model based on the training model inputs. The computational model can receive an input vector and provide a corresponding feature vector. The processor can then locate a trial analysis region of a trial data stream based on the predetermined structure data and determine a trial model input. The processor can operate the computational model based on the trial model input to provide a trial feature vector, e.g., a signature. The processor can operate a second computational model to provide a classification based on the signature.
-
公开(公告)号:US20210075798A1
公开(公告)日:2021-03-11
申请号:US17087194
申请日:2020-11-02
Applicant: CrowdStrike, Inc.
Inventor: Sven Krasser , David Elkind , Brett Meyer , Patrick Crenshaw
Abstract: Example techniques described herein determine a validation dataset, determine a computational model using the validation dataset, or determine a signature or classification of a data stream such as a file. The classification can indicate whether the data stream is associated with malware. A processing unit can determine signatures of individual training data streams. The processing unit can determine, based at least in part on the signatures and a predetermined difference criterion, a training set and a validation set of the training data streams. The processing unit can determine a computational model based at least in part on the training set. The processing unit can then operate the computational model based at least in part on a trial data stream to provide a trial model output. Some examples include determining the validation set based at least in part on the training set and the predetermined criterion for difference between data streams.
-
-
-
-
-
-
-
-
-