Invention Grant
US07987177B2 Method for estimating the number of distinct values in a partitioned dataset
有权
用于估计分区数据集中不同值的数量的方法
- Patent Title: Method for estimating the number of distinct values in a partitioned dataset
- Patent Title (中): 用于估计分区数据集中不同值的数量的方法
-
Application No.: US12022601Application Date: 2008-01-30
-
Publication No.: US07987177B2Publication Date: 2011-07-26
- Inventor: Kevin Scott Beyer , Rainer Gemulla , Peter Jay Haas , Berthold Reinwald , John Sismanis
- Applicant: Kevin Scott Beyer , Rainer Gemulla , Peter Jay Haas , Berthold Reinwald , John Sismanis
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agency: IP Authority, LLC
- Agent Ramraj Soundararajan
- Main IPC: G06F17/00
- IPC: G06F17/00 ; G06F17/30

Abstract:
The task of estimating the number of distinct values (DVs) in a large dataset arises in a wide variety of settings in computer science and elsewhere. The present invention provides synopses for DV estimation in the setting of a partitioned dataset, as well as corresponding DV estimators that exploit these synopses. Whenever an output compound data partition is created via a multiset operation on a pair of (possibly compound) input partitions, the synopsis for the output partition can be obtained by combining the synopses of the input partitions. If the input partitions are compound partitions, it is not necessary to access the synopses for all the base partitions that were used to construct the input partitions. Superior (in certain cases near-optimal) accuracy in DV estimates is maintained, especially when the synopsis size is small. The synopses can be created in parallel, and can also handle deletions of individual partition elements.
Public/Granted literature
- US20090192980A1 Method for Estimating the Number of Distinct Values in a Partitioned Dataset Public/Granted day:2009-07-30
Information query