Fast binary rule extraction for large scale text data

Invention Grant

US08832015B2 Fast binary rule extraction for large scale text data 有权

Title translation: 用于大规模文本数据的快速二进制规则提取

Please log in to see more content

Patent Title: Fast binary rule extraction for large scale text data
Patent Title (中): 用于大规模文本数据的快速二进制规则提取
Application No.: US13624052

Application Date: 2012-09-21
Publication No.: US08832015B2

Publication Date: 2014-09-09
Inventor: James Allen Cox , Zheng Zhao
Applicant: SAS Institute Inc.
Applicant Address: US NC Cary
Assignee: SAS Institute Inc.
Current Assignee: SAS Institute Inc.
Current Assignee Address: US NC Cary
Agency: Kilpatrick Townsend & Stockton LLP
Main IPC: G06F17/00
IPC: G06F17/00 ; G06N5/02

Fast binary rule extraction for large scale text data

Abstract:

Systems and methods for identifying data files that have a common characteristic are provided. A plurality of data files including one or more data files having a common characteristic are received. A potential rule is generated by selecting key terms from a list that satisfy a term evaluation metric, and the potential rule is evaluated using a rule evaluation metric. The potential rule is added to the rule set if the rule evaluation metric is satisfied. Based upon the potential rule being added to the rule set, data files covered by the potential rule are removed from the plurality of data files. The potential rule generation and evaluation steps are repeated until a stopping criterion is met. After the stopping criterion has been met, the rule set is used to identify other data files having the common characteristic.

Abstract(Chinese):

提供了用于识别具有共同特征的数据文件的系统和方法。接收包括具有共同特性的一个或多个数据文件的多个数据文件。通过从满足术语评估指标的列表中选择关键词来生成潜在规则，并使用规则评估度量来评估潜在规则。如果满足规则评估指标，则将潜在规则添加到规则集中。基于添加到规则集中的潜在规则，从多个数据文件中移除潜在规则覆盖的数据文件。重复潜在规则生成和评估步骤，直到满足停止标准。在满足停止标准之后，规则集用于识别具有共同特征的其他数据文件。

Public/Granted literature

US20140089247A1 Fast Binary Rule Extraction for Large Scale Text Data Public/Granted day:2014-03-27

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F17/00	特别适用于特定功能的数字计算设备或数据处理设备或数据处理方法（信息检索，数据库结构或文件系统结构，G06F 16/00）