Invention Grant
- Patent Title: Code labeling based on tokenized code samples
-
Application No.: US14599394Application Date: 2015-01-16
-
Publication No.: US10044750B2Publication Date: 2018-08-07
- Inventor: Benjamin Livshits , Benjamin G. Zorn , Benjamin Stock
- Applicant: Microsoft Technology Licensing, LLC.
- Applicant Address: US WA Redmond
- Assignee: Microsoft Technology Licensing, LLC
- Current Assignee: Microsoft Technology Licensing, LLC
- Current Assignee Address: US WA Redmond
- Agency: Drinker Biddle & Reath LLP
- Main IPC: G06F11/00
- IPC: G06F11/00 ; G06F12/14 ; G06F12/16 ; G08B23/00 ; H04L29/06 ; G06F21/56 ; G06F17/30

Abstract:
Disclosed herein are systems and methods for detecting script code malware and generating signatures. A plurality of script code samples are received and transformed into a plurality of tokenized samples. The tokenized samples are based on syntactical elements of the plurality of script code samples. One or more clusters of samples are determined based on similarities in different ones of the plurality of tokenized samples, and known malicious code having a threshold similarity to a representative sample of the cluster of samples is identified. Based on the identifying, the cluster of samples is identified as malicious. Based at least on respective ones of the plurality of tokenized samples associated with the cluster of samples, a generalized code signature usable to identify the script code samples in the cluster of samples is generated.
Public/Granted literature
- US20160212153A1 Code Labeling Based on Tokenized Code Samples Public/Granted day:2016-07-21
Information query