Invention Grant
US09043247B1 Systems and methods for classifying documents for data loss prevention
有权
用于分类数据丢失预防的文档的系统和方法
- Patent Title: Systems and methods for classifying documents for data loss prevention
- Patent Title (中): 用于分类数据丢失预防的文档的系统和方法
-
Application No.: US13405293Application Date: 2012-02-25
-
Publication No.: US09043247B1Publication Date: 2015-05-26
- Inventor: Michael Hart , Kushal Tayal , Phillip DiCorpo
- Applicant: Michael Hart , Kushal Tayal , Phillip DiCorpo
- Applicant Address: US CA Mountain View
- Assignee: Symantec Corporation
- Current Assignee: Symantec Corporation
- Current Assignee Address: US CA Mountain View
- Agency: ALG Intellectual Property, LLC
- Main IPC: G06F15/18
- IPC: G06F15/18 ; G06F17/30 ; G06N5/02

Abstract:
A computer-implemented method for classifying documents for data loss prevention may include 1) identifying training documents for a machine learning classifier configured for data loss prevention, 2) performing a semantic analysis on training documents to identify topics within the set training documents, 3) applying a similarity metric to the topics to identify at least one unrelated topic with a similarity to the other topics within the plurality of topics, as determined by the similarity metric, that falls below a similarity threshold, 4) identifying, based on the semantic analysis, at least one irrelevant training document within the set of training documents in which a predominance of the unrelated topic is above a predominance threshold, and 5) excluding the irrelevant training document from the set of training documents based on the predominance of the unrelated topic within the irrelevant training document. Various other methods, systems, and computer-readable media are also disclosed.
Information query