Methods and systems for analyzing datasets

    公开(公告)号:US10445341B2

    公开(公告)日:2019-10-15

    申请号:US14300123

    申请日:2014-06-09

    Inventor: Andrew Matteson

    Abstract: Computer-implemented methods are provided for analyzing datasets. Consistent with disclosed embodiments, a computing system may be configured to select a cluster from clusters partitioning the dataset. The clusters may include a subset of the dataset, and may be associated with a current medoid of the cluster and a current cost of the cluster. The computing system may determine a new cost of the selected cluster and a new medoid of the selected cluster based on a matrix with rows corresponding to data in a subset of the cluster. The columns may correspond to data in the dataset or only to data in the cluster. The computer system may replace the current medoid of the selected cluster with the new medoid of the selected cluster based on the new cost of the selected cluster. The computer system may output the cluster information to determine a structure of the dataset.

    METHODS AND SYSTEMS FOR ANALYZING DATASETS
    2.
    发明申请
    METHODS AND SYSTEMS FOR ANALYZING DATASETS 审中-公开
    分析数据的方法和系统

    公开(公告)号:US20150356163A1

    公开(公告)日:2015-12-10

    申请号:US14300123

    申请日:2014-06-09

    Inventor: Andrew Matteson

    CPC classification number: G06F16/285

    Abstract: Computer-implemented methods are provided for analyzing datasets. Consistent with disclosed embodiments, a computing system may be configured to select a cluster from clusters partitioning the dataset. The clusters may include a subset of the dataset, and may be associated with a current medoid of the cluster and a current cost of the cluster. The computing system may determine a new cost of the selected cluster and a new medoid of the selected cluster based on a matrix with rows corresponding to data in a subset of the cluster. The columns may correspond to data in the dataset or only to data in the cluster. The computer system may replace the current medoid of the selected cluster with the new medoid of the selected cluster based on the new cost of the selected cluster. The computer system may output the cluster information to determine a structure of the dataset.

    Abstract translation: 提供计算机实现的方法来分析数据集。 与所公开的实施例一致,计算系统可以被配置为从分割数据集的集群中选择集群。 集群可以包括数据集的子集,并且可以与集群的当前集体和集群的当前成本相关联。 计算系统可以基于具有与集群的子集中的数据相对应的行的矩阵来确定所选择的集群的新成本和所选择的集群的新成本。 这些列可以对应于数据集中的数据,也可以对应于集群中的数据。 计算机系统可以基于所选择的集群的新成本,用所选择的集群的新集团来替换所选集群的当前集团。 计算机系统可输出群集信息以确定数据集的结构。

    Methods and systems for analyzing discrete-valued datasets

    公开(公告)号:US10394898B1

    公开(公告)日:2019-08-27

    申请号:US14855247

    申请日:2015-09-15

    Inventor: Andrew Matteson

    Abstract: Methods and systems disclosed herein may be used to determine the structure of a dataset comprising discrete-valued data corresponding to features and items. In some embodiments, a device may receive a discrete-valued matrix with a first dimension corresponding to items and a second dimension corresponding to features. The device may calculate an engineered features set and a weights set for the matrix. The device may update the engineered features set using the weights set, and update the weights set using the updated engineered features set based on the mutual information between the matrix and one of the updated engineered features set. The device may receive a request indicating at least one of the engineered features set, identify items based on the matrix and the indicated at least one of the engineered features set, and provide a response based on the identified items.

    METHODS AND SYSTEMS FOR CALCULATING JOINT STATISTICAL INFORMATION
    4.
    发明申请
    METHODS AND SYSTEMS FOR CALCULATING JOINT STATISTICAL INFORMATION 审中-公开
    计算联合统计信息的方法和系统

    公开(公告)号:US20150356056A1

    公开(公告)日:2015-12-10

    申请号:US14733477

    申请日:2015-06-08

    Inventor: Andrew Matteson

    CPC classification number: G06F17/18 G06F17/16

    Abstract: Computer-implemented methods and systems are provided for calculating statistical information. A computing system may be configured to call a linear algebra subroutine adapted to efficiently perform matrix multiplication, providing as arguments a first matrix and a second matrix, consistent with disclosed embodiments. The first matrix may include first elements corresponding to binned values of first measurements associated with a first observation. The second matrix may include second elements corresponding to binned values of second measurements associated with a set of second observations. The computing system may be configured to receive a joint value matrix estimating the joint probabilities for the binned measurements from the linear algebra subroutine. The computing system may determine a structure of the set of second observations based on the joint value matrix. In certain aspects, the computing system may determine the mutual information between the first observation and the set of second observations.

    Abstract translation: 提供计算机实现的方法和系统来计算统计信息。 计算系统可以被配置为调用适于有效地执行矩阵乘法的线性代数子程序,提供与所公开的实施例一致的第一矩阵和第二矩阵作为参数。 第一矩阵可以包括对应于与第一观察相关联的第一测量的合并值的第一元素。 第二矩阵可以包括对应于与一组第二观察相关联的第二测量的二进制值的第二元素。 计算系统可以被配置为从线性代数子程序接收估计装箱测量的联合概率的联合值矩阵。 计算系统可以基于联合值矩阵来确定第二观测组的结构。 在某些方面,计算系统可以确定第一观测和第二观测集合之间的相互信息。

Patent Agency Ranking