Invention Grant
US08832015B2 Fast binary rule extraction for large scale text data 有权
用于大规模文本数据的快速二进制规则提取

Fast binary rule extraction for large scale text data
Abstract:
Systems and methods for identifying data files that have a common characteristic are provided. A plurality of data files including one or more data files having a common characteristic are received. A potential rule is generated by selecting key terms from a list that satisfy a term evaluation metric, and the potential rule is evaluated using a rule evaluation metric. The potential rule is added to the rule set if the rule evaluation metric is satisfied. Based upon the potential rule being added to the rule set, data files covered by the potential rule are removed from the plurality of data files. The potential rule generation and evaluation steps are repeated until a stopping criterion is met. After the stopping criterion has been met, the rule set is used to identify other data files having the common characteristic.
Public/Granted literature
Information query
Patent Agency Ranking
0/0