Invention Grant
US08214365B1 Measuring confidence of file clustering and clustering based file classification
有权
测量文件聚类和基于聚类的文件分类的置信度
- Patent Title: Measuring confidence of file clustering and clustering based file classification
- Patent Title (中): 测量文件聚类和基于聚类的文件分类的置信度
-
Application No.: US13036864Application Date: 2011-02-28
-
Publication No.: US08214365B1Publication Date: 2012-07-03
- Inventor: Pratyusa Kumar Manadhata , Sandeep B. Bhatkar , Kent E. Griffin
- Applicant: Pratyusa Kumar Manadhata , Sandeep B. Bhatkar , Kent E. Griffin
- Applicant Address: US CA Mountain View
- Assignee: Symantec Corporation
- Current Assignee: Symantec Corporation
- Current Assignee Address: US CA Mountain View
- Agency: Brill Law Office
- Agent Jeffrey Brill
- Main IPC: G06F7/00
- IPC: G06F7/00

Abstract:
A uniformity of a cluster of samples is determined, and a corresponding raw confidence value is calculated. A confidence interval weight is calculated using a confidence interval to determine reliability of the uniformity. A trace length weight is calculated, as a function of traces of the samples. An n-gram weight is calculated, as a function of numbers of n-grams generated by the samples. A compactness weight is calculated, as a function of the similarity of the samples. A cluster weight is calculated as a function of the four above-described weights. A cluster confidence measurement is calculated as a function of the cluster weight and the raw confidence value. When a new sample is assigned to the cluster, an assignment confidence measurement is calculated, as a function of the cluster's confidence measurement and the sample's trace length, n-grams and similarity.
Information query