Efficient data infrastructure for high dimensional data analysis

Invention Grant

US07870114B2 Efficient data infrastructure for high dimensional data analysis 有权

Title translation: 高维数据基础架构，用于高维数据分析

Please log in to see more content

Patent Title: Efficient data infrastructure for high dimensional data analysis
Patent Title (中): 高维数据基础架构，用于高维数据分析
Application No.: US11818879

Application Date: 2007-06-15
Publication No.: US07870114B2

Publication Date: 2011-01-11
Inventor: Haidong Zhang , Guowei Liu , Yantao Li , Bing Sun , Jian Wang
Applicant: Haidong Zhang , Guowei Liu , Yantao Li , Bing Sun , Jian Wang
Applicant Address: US WA Redmond
Assignee: Microsoft Corporation
Current Assignee: Microsoft Corporation
Current Assignee Address: US WA Redmond
Main IPC: G06F17/30
IPC: G06F17/30

Efficient data infrastructure for high dimensional data analysis

Abstract:

Described is a technology by which high dimensional source data corresponding to rows of records with identifiers, and columns comprising dimensions of data values, are processed into a file model for efficient access. An inverted index corresponding to any dimension is built by mapping data from raw dimension values to mapped values based on mapping entries in a dimension table. The record identifiers are arranged into subgroups based on their mapped value; a count and/or an offset may be maintained for locating each of the subgroups. The raw values for a dimension are maintained within a raw value file. For sparse data, the raw value file may be compressed, e.g., by excluding nulls and associating a record identifier with each non-null. A data manager provides access to data in the data files, such as by offering various functions, using caching for efficiency.

Abstract(Chinese):

描述了一种技术，通过该技术将对应于具有标识符的记录行的高维源数据和包括数据值的维的列处理成用于有效访问的文件模型。通过根据维度表中的映射条目将数据从原始维度值映射到映射值，构建对应于任何维度的反向索引。记录标识符根据其映射值排列成子组; 可以维持计数和/或偏移以定位每个子组。维度的原始值保持在原始值文件中。对于稀疏数据，可以例如通过排除空值并将记录标识符与每个非空值相关联来压缩原始值文件。数据管理器提供对数据文件中的数据的访问，例如通过提供各种功能，使用缓存来提高效率。

Public/Granted literature

US20080313213A1 Efficient data infrastructure for high dimensional data analysis Public/Granted day:2008-12-18

Information query

Espacenet