Invention Grant
- Patent Title: Lower-dimensional subspace approximation of a dataset
-
Application No.: US15295388Application Date: 2016-10-17
-
Publication No.: US10346405B2Publication Date: 2019-07-09
- Inventor: Kenneth L. Clarkson , David P. Woodruff
- Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
- Applicant Address: US NY Armonk
- Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
- Current Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
- Current Assignee Address: US NY Armonk
- Agency: Cantor Colburn LLP
- Main IPC: G06F16/33
- IPC: G06F16/33 ; G06F16/35 ; G06F16/2455

Abstract:
A lower-dimensional representation (e.g., approximation) of a dataset is determined. The lower-dimensional representation can be used, for example, to perform semantic document analysis. Given a matrix of input data points, where each entry of the matrix indicates a number of times a particular term in a set of terms appears in a particular document in a set of documents, a lower-dimensional compressed matrix is obtained from the matrix by sampling rows of the matrix based on a target rank parameter, a desired accuracy tolerance, leverage scores calculated for the rows, and/or distances from rows of the matrix to a span of the initial set of sampled rows. The compressed matrix is used to determine a similarity metric indicative of a degree of similarity between documents. The documents can then be classified into a same document cluster or different clusters based on whether the similarity metric satisfied a threshold value.
Public/Granted literature
- US20180107716A1 LOWER-DIMENSIONAL SUBSPACE APPROXIMATION OF A DATASET Public/Granted day:2018-04-19
Information query