System and method for data mining and similarity estimation
Abstract:
A method for data mining includes receiving input vectors and converting them into corresponding sketch feature vectors each having a number of output dimensions that is less than a number of dimensions of the corresponding input vector. Each sketch feature vector is compared against parameters and a decision loop generates results of similarities based on the comparisons. An estimate of cosine similarity or Pearson correlation of the input vectors is obtained based on estimates of an inner product of two input vectors and a 2-norm vector of an input vector. The estimates are obtained using respective hash tables for each input vector having a number of entries up to the number of output dimensions of the sketch feature vector. A decision is provided based on the results of the similarities and an application of the data mining such that the decision is implemented by the application.
Public/Granted literature
Information query
Patent Agency Ranking
0/0