-
公开(公告)号:US20240256543A1
公开(公告)日:2024-08-01
申请号:US18160861
申请日:2023-01-27
Applicant: Databricks, Inc.
Inventor: Shoumik Palkar , Alexander Behm , Mostafa Mokhtar , Sriram Krishnamurthy
IPC: G06F16/2453 , G06F11/34 , G06F16/22
CPC classification number: G06F16/24545 , G06F11/3409 , G06F16/221
Abstract: Disclosed herein is a method for determining whether to apply a lazy materialization technique to a query run. A data processing service receives a request to perform a query identifying a filter column and a non-filter column in a columnar database. The data processing service accesses a first task of contiguous rows in the filter column from a cloud-based object storage. The data processing service applies a filter defined by the query to the first task. The data processing service generates filter results for the first task that may include a percentage of the first task discarded and a run-time. The data processing service determines, based on the filter results for the first task, a likelihood value that indicates a likelihood of gaining a performance benefit by applying the lazy materialization technique to a second task of the query.
-
公开(公告)号:US20240256539A1
公开(公告)日:2024-08-01
申请号:US18160850
申请日:2023-01-27
Applicant: Databricks, Inc.
Inventor: Shoumik Palkar , Alexander Behm , Mostafa Mokhtar , Sriram Krishnamurthy
IPC: G06F16/2453 , G06F16/22
CPC classification number: G06F16/24539 , G06F16/221
Abstract: Disclosed herein is a method for determining whether to apply a lazy materialization technique to a query run. The method includes receiving a request to perform a new query in a columnar database containing a plurality of columns. A step in the method includes accessing a set of data in a column of the plurality of columns based on the query. The method includes generating an input to a machine-learned model comprising characteristics of the set of data in the column. From the machine-learned model, the method includes generating a likelihood value indicative of whether a filter of a first portion of the set of data in the column has greater efficiency than a download followed by a filter of the set of data in the column. The method further includes comparing the likelihood value to a threshold value. Based on the comparison, the method includes filtering the first portion of the set of data before downloading the set of data if the likelihood value is equal to or above the threshold value.
-
公开(公告)号:US12124450B2
公开(公告)日:2024-10-22
申请号:US18160861
申请日:2023-01-27
Applicant: Databricks, Inc.
Inventor: Shoumik Palkar , Alexander Behm , Mostafa Mokhtar , Sriram Krishnamurthy
IPC: G06F16/2453 , G06F11/34 , G06F16/22
CPC classification number: G06F16/24545 , G06F11/3409 , G06F16/221
Abstract: Disclosed herein is a method for determining whether to apply a lazy materialization technique to a query run. A data processing service receives a request to perform a query identifying a filter column and a non-filter column in a columnar database. The data processing service accesses a first task of contiguous rows in the filter column from a cloud-based object storage. The data processing service applies a filter defined by the query to the first task. The data processing service generates filter results for the first task that may include a percentage of the first task discarded and a run-time. The data processing service determines, based on the filter results for the first task, a likelihood value that indicates a likelihood of gaining a performance benefit by applying the lazy materialization technique to a second task of the query.
-
-