Organizing, joining, and performing statistical calculations on massive sets of data

Invention Grant

US09330129B2 Organizing, joining, and performing statistical calculations on massive sets of data 有权

Please log in to see more content

Patent Title: Organizing, joining, and performing statistical calculations on massive sets of data
Application No.: US14536220

Application Date: 2014-11-07
Publication No.: US09330129B2

Publication Date: 2016-05-03
Inventor: Srinivas S. Vemuri , Maneesh Varshney , Krishna P. Puttaswamy Naga , Rui Liu
Applicant: LinkedIn Corporation
Applicant Address: US CA Mountain View
Assignee: LinkedIn Corporation
Current Assignee: LinkedIn Corporation
Current Assignee Address: US CA Mountain View
Agency: Park, Vaughan, Fleming & Dowler LLP
Main IPC: G06F17/30
IPC: G06F17/30 ; G06F7/00 ; G06F7/36

Organizing, joining, and performing statistical calculations on massive sets of data

Abstract:

A system, method, and apparatus are provided for organizing and joining massive sets of data (e.g., tens or hundreds of millions of event records). A dataset is Blocked by first identifying a partition key, which comprises one or more columns of the data. Each Block will contain all dataset records that have partition key values assigned to that Block. A cost constraint (e.g., a maximum size, a maximum number of records) may also be applied to the Blocks. A Block index is generated to identify all Blocks, their corresponding (sequential) partition key values, and their locations. A second dataset that includes the partition key column(s) and that must be correlated with the first dataset may then be Blocked according to the same ranges of partition key values (but without the cost constraint). Corresponding Blocks of the datasets may then be Joined/Aggregated, and analyzed as necessary.

Public/Granted literature

US20150261804A1 ORGANIZING, JOINING, AND PERFORMING STATISTICAL CALCULATIONS ON MASSIVE SETS OF DATA Public/Granted day:2015-09-17

Information query

Espacenet