Abstract:
Computer-implemented methods are provided for analyzing datasets. Consistent with disclosed embodiments, a computing system may be configured to select a cluster from clusters partitioning the dataset. The clusters may include a subset of the dataset, and may be associated with a current medoid of the cluster and a current cost of the cluster. The computing system may determine a new cost of the selected cluster and a new medoid of the selected cluster based on a matrix with rows corresponding to data in a subset of the cluster. The columns may correspond to data in the dataset or only to data in the cluster. The computer system may replace the current medoid of the selected cluster with the new medoid of the selected cluster based on the new cost of the selected cluster. The computer system may output the cluster information to determine a structure of the dataset.
Abstract:
Computer-implemented methods are provided for analyzing datasets. Consistent with disclosed embodiments, a computing system may be configured to select a cluster from clusters partitioning the dataset. The clusters may include a subset of the dataset, and may be associated with a current medoid of the cluster and a current cost of the cluster. The computing system may determine a new cost of the selected cluster and a new medoid of the selected cluster based on a matrix with rows corresponding to data in a subset of the cluster. The columns may correspond to data in the dataset or only to data in the cluster. The computer system may replace the current medoid of the selected cluster with the new medoid of the selected cluster based on the new cost of the selected cluster. The computer system may output the cluster information to determine a structure of the dataset.
Abstract:
Methods and systems disclosed herein may be used to determine the structure of a dataset comprising discrete-valued data corresponding to features and items. In some embodiments, a device may receive a discrete-valued matrix with a first dimension corresponding to items and a second dimension corresponding to features. The device may calculate an engineered features set and a weights set for the matrix. The device may update the engineered features set using the weights set, and update the weights set using the updated engineered features set based on the mutual information between the matrix and one of the updated engineered features set. The device may receive a request indicating at least one of the engineered features set, identify items based on the matrix and the indicated at least one of the engineered features set, and provide a response based on the identified items.
Abstract:
Computer-implemented methods and systems are provided for calculating statistical information. A computing system may be configured to call a linear algebra subroutine adapted to efficiently perform matrix multiplication, providing as arguments a first matrix and a second matrix, consistent with disclosed embodiments. The first matrix may include first elements corresponding to binned values of first measurements associated with a first observation. The second matrix may include second elements corresponding to binned values of second measurements associated with a set of second observations. The computing system may be configured to receive a joint value matrix estimating the joint probabilities for the binned measurements from the linear algebra subroutine. The computing system may determine a structure of the set of second observations based on the joint value matrix. In certain aspects, the computing system may determine the mutual information between the first observation and the set of second observations.