Reducing consumption of computing resources in performing computerized sequence-mining on large data sets

    公开(公告)号:US11120032B1

    公开(公告)日:2021-09-14

    申请号:US17209752

    申请日:2021-03-23

    Abstract: Computing resources consumed in performing computerized sequence-mining can be reduced by implementing some examples of the present disclosure. In one example, a system can determine weights for data entries in a data set and then select a group of data entries from the data set based on the weights. Next, the system can determine a group of k-length sequences present in the selected group of data entries by applying a shuffling algorithm. The system can then determine frequencies corresponding to the group of k-length sequences and select candidate sequences from among the group of k-length sequences based on the frequencies thereof. Next, the system can determine support values corresponding to the candidate sequences and then select output sequences from among the candidate sequences based on the support values thereof. The system may then transmit an output signal indicating the selected output sequences an electronic device.

    METHODS AND SYSTEMS FOR USING CLUSTERING FOR SPLITTING TREE NODES IN CLASSIFICATION DECISION TREES
    14.
    发明申请
    METHODS AND SYSTEMS FOR USING CLUSTERING FOR SPLITTING TREE NODES IN CLASSIFICATION DECISION TREES 审中-公开
    用于分类树分类的分类方法和系统在分类​​决策中的应用

    公开(公告)号:US20140351196A1

    公开(公告)日:2014-11-27

    申请号:US14284222

    申请日:2014-05-21

    CPC classification number: G06F16/2246 G06N5/003

    Abstract: Systems and methods for determining an optimal splitting scheme for a node in a classification decision tree. A computing system may receive input data related to a decision tree to be generated from a data set. The input data identifies a target attribute of the data set and a set of candidate attributes of the data set to be used as nodes in the decision tree. The computing system may determine, using a clustering algorithm and the set of candidate attributes, a number of potential splitting schemes to be used to split a node in the decision tree. The computing system may calculate a splitting measurement for each of the plurality of potential splitting schemes. The computing system may select an optimal splitting scheme from the plurality of potential splitting schemes for each node in the decision tree based on the splitting measurement.

    Abstract translation: 用于确定分类决策树中的节点的最优分割方案的系统和方法。 计算系统可以接收与从数据集合生成的决策树相关的输入数据。 输入数据识别数据集的目标属性和要用作决策树中的节点的数据集的候选属性集合。 计算系统可以使用聚类算法和候选属性集来确定要用于分解决策树中的节点的多个潜在分裂方案。 计算系统可以计算多个潜在分割方案中的每一个的分裂测量。 计算系统可以基于分割测量从决策树中的每个节点的多个潜在分裂方案中选择最优分割方案。

    METHODS AND SYSTEMS FOR DATA REDUCTION IN CLUSTER ANALYSIS IN DISTRIBUTED DATA ENVIRONMENTS
    15.
    发明申请
    METHODS AND SYSTEMS FOR DATA REDUCTION IN CLUSTER ANALYSIS IN DISTRIBUTED DATA ENVIRONMENTS 审中-公开
    分布式数据环境中集群分析中数据减少的方法与系统

    公开(公告)号:US20140330826A1

    公开(公告)日:2014-11-06

    申请号:US14270142

    申请日:2014-05-05

    CPC classification number: G06F16/285

    Abstract: Systems and methods for data reduction of a data set are included. A computing system may group data points in a data set into a number of data point bubbles represented by a number of representative points. A data point bubble may include a one or more data points from the data set and a representative point from the data set. The computing system may calculate a cluster assignment for the representative point by executing a clustering algorithm using the number of representative points.

    Abstract translation: 包括用于数据集的数据简化的系统和方法。 计算系统可以将数据集中的数据点分组成由多个代表点表示的多个数据点气泡。 数据点气泡可以包括来自数据集的一个或多个数据点和来自数据集的代表点。 计算系统可以通过使用代表点的数量执行聚类算法来计算代表点的簇分配。

Patent Agency Ranking