Lower-dimensional subspace approximation of a dataset

Invention Grant

US10346405B2 Lower-dimensional subspace approximation of a dataset 有权

Please log in to see more content

Patent Title: Lower-dimensional subspace approximation of a dataset
Application No.: US15295388

Application Date: 2016-10-17
Publication No.: US10346405B2

Publication Date: 2019-07-09
Inventor: Kenneth L. Clarkson , David P. Woodruff
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Applicant Address: US NY Armonk
Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
Current Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
Current Assignee Address: US NY Armonk
Agency: Cantor Colburn LLP
Main IPC: G06F16/33
IPC: G06F16/33 ; G06F16/35 ; G06F16/2455

Lower-dimensional subspace approximation of a dataset

Abstract:

A lower-dimensional representation (e.g., approximation) of a dataset is determined. The lower-dimensional representation can be used, for example, to perform semantic document analysis. Given a matrix of input data points, where each entry of the matrix indicates a number of times a particular term in a set of terms appears in a particular document in a set of documents, a lower-dimensional compressed matrix is obtained from the matrix by sampling rows of the matrix based on a target rank parameter, a desired accuracy tolerance, leverage scores calculated for the rows, and/or distances from rows of the matrix to a span of the initial set of sampled rows. The compressed matrix is used to determine a similarity metric indicative of a degree of similarity between documents. The documents can then be classified into a same document cluster or different clusters based on whether the similarity metric satisfied a threshold value.

Public/Granted literature

US20180107716A1 LOWER-DIMENSIONAL SUBSPACE APPROXIMATION OF A DATASET Public/Granted day:2018-04-19

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F16/00	信息检索；数据库结构；文件系统结构
G06F16/30	.•非结构文本数据（文档管理系统入G06F 16/93）
G06F16/33	..••查询