-
公开(公告)号:US10853368B2
公开(公告)日:2020-12-01
申请号:US15943496
申请日:2018-04-02
Applicant: Cloudera, Inc.
Inventor: Alexander Behm , Mostafa Mokhtar
IPC: G06F16/00 , G06F16/2453 , G06F16/2458 , G06F16/835
Abstract: The problem of distinct value estimation has many applications, but is particularly important in the field of database technology where such information is utilized by query planners to generate and optimize query plans. Introduced is a novel technique for estimating the number of distinct values in a given dataset without scanning all of the values in the dataset. In an example embodiment, the introduced technique includes gathering multiple intermediate probabilistic estimates based on varying samples of the dataset, 2) plotting the multiple intermediate probabilistic estimates against indications of sample size, 3) fitting a function to the plotted data points, and 4) determining an overall distinct value estimate by extrapolating the objective function to an estimated or known total number of values in the dataset.
-
公开(公告)号:US12105712B2
公开(公告)日:2024-10-01
申请号:US18305715
申请日:2023-04-24
Applicant: Cloudera, Inc.
Inventor: Alexander Behm , Mostafa Mokhtar
IPC: G06F16/2453 , G06F16/2458 , G06F16/835
CPC classification number: G06F16/24545 , G06F16/24547 , G06F16/2471 , G06F16/8373 , G06F16/24549
Abstract: The problem of distinct value estimation has many applications, but is particularly important in the field of database technology where such information is utilized by query planners to generate and optimize query plans. Introduced is a novel technique for estimating the number of distinct values in a given dataset without scanning all of the values in the dataset. In an example embodiment, the introduced technique includes gathering multiple intermediate probabilistic estimates based on varying samples of the dataset, 2) plotting the multiple intermediate probabilistic estimates against indications of sample size, 3) fitting a function to the plotted data points, and 4) determining an overall distinct value estimate by extrapolating the objective function to an estimated or known total number of values in the dataset.
-
公开(公告)号:US20230350894A1
公开(公告)日:2023-11-02
申请号:US18305715
申请日:2023-04-24
Applicant: Cloudera, Inc.
Inventor: Alexander Behm , Mostafa Mokhtar
IPC: G06F16/2453 , G06F16/2458 , G06F16/835
CPC classification number: G06F16/24545 , G06F16/2471 , G06F16/8373 , G06F16/24547 , G06F16/24549
Abstract: The problem of distinct value estimation has many applications, but is particularly important in the field of database technology where such information is utilized by query planners to generate and optimize query plans. Introduced is a novel technique for estimating the number of distinct values in a given dataset without scanning all of the values in the dataset. In an example embodiment, the introduced technique includes gathering multiple intermediate probabilistic estimates based on varying samples of the dataset, 2) plotting the multiple intermediate probabilistic estimates against indications of sample size, 3) fitting a function to the plotted data points, and 4) determining an overall distinct value estimate by extrapolating the objective function to an estimated or known total number of values in the dataset.
-
公开(公告)号:US11663213B2
公开(公告)日:2023-05-30
申请号:US17105014
申请日:2020-11-25
Applicant: Cloudera, Inc.
Inventor: Alexander Behm , Mostafa Mokhtar
IPC: G06F16/00 , G06F16/2453 , G06F16/2458 , G06F16/835
CPC classification number: G06F16/24545 , G06F16/2471 , G06F16/24547 , G06F16/8373 , G06F16/24549
Abstract: The problem of distinct value estimation has many applications, but is particularly important in the field of database technology where such information is utilized by query planners to generate and optimize query plans. Introduced is a novel technique for estimating the number of distinct values in a given dataset without scanning all of the values in the dataset. In an example embodiment, the introduced technique includes gathering multiple intermediate probabilistic estimates based on varying samples of the dataset, 2) plotting the multiple intermediate probabilistic estimates against indications of sample size, 3) fitting a function to the plotted data points, and 4) determining an overall distinct value estimate by extrapolating the objective function to an estimated or known total number of values in the dataset.
-
公开(公告)号:US20210149904A1
公开(公告)日:2021-05-20
申请号:US17105014
申请日:2020-11-25
Applicant: Cloudera, Inc.
Inventor: Alexander Behm , Mostafa Mokhtar
IPC: G06F16/2453 , G06F16/2458 , G06F16/835
Abstract: The problem of distinct value estimation has many applications, but is particularly important in the field of database technology where such information is utilized by query planners to generate and optimize query plans. Introduced is a novel technique for estimating the number of distinct values in a given dataset without scanning all of the values in the dataset. In an example embodiment, the introduced technique includes gathering multiple intermediate probabilistic estimates based on varying samples of the dataset, 2) plotting the multiple intermediate probabilistic estimates against indications of sample size, 3) fitting a function to the plotted data points, and 4) determining an overall distinct value estimate by extrapolating the objective function to an estimated or known total number of values in the dataset.
-
-
-
-