Framework for providing intermediate aggregation operators in a query plan

    公开(公告)号:US11620287B2

    公开(公告)日:2023-04-04

    申请号:US16939750

    申请日:2020-07-27

    Applicant: Snowflake Inc.

    Abstract: The subject technology receives a query plan, the query plan comprising a set of query operations, the set of query operations including at least one aggregation. The subject technology analyzes the at least one aggregation to generate a modified query plan, the modified query plan including at least a top aggregation operator, an intermediate aggregation operator, and a bottom aggregation operator. The subject technology performs, with respect to the intermediate aggregation operator, at least one operation comprising: the subject technology receives an input intermediate data type; the subject technology performs an internalize operation on the input intermediate data type to generate an internal state; the subject technology performs an accumulate operation on the internal state to generate intermediate data; and the subject technology performs an externalize operation on the intermediate data to generate an output data type.

    Automatic pruning cutoff in a database system

    公开(公告)号:US11615095B2

    公开(公告)日:2023-03-28

    申请号:US17162979

    申请日:2021-01-29

    Applicant: Snowflake Inc.

    Abstract: During a query compilation process, a query is received that is directed to a set of source tables, each source table from the set of source tables being organized into at least one micro-partition and the query including at least one pruning operation. During the query compilation process, a modification of the query is performed for adjusting the at least one pruning operation, the modification being based on a set of statistics collected for previous pruning operations on at least a portion of the set of source tables and a set of heuristics, the set of statistics indicating at least an amount of execution time for each previous query associated with each of the previous pruning operations. The query is compiled including the modification of the query. The compiled query is provided to an execution node of a database system for execution.

    SCALABLE QUERY PROCESSING
    14.
    发明申请

    公开(公告)号:US20220414097A1

    公开(公告)日:2022-12-29

    申请号:US17823572

    申请日:2022-08-31

    Applicant: Snowflake Inc.

    Abstract: Embodiments of the present disclosure may provide a dynamic query execution model. This query execution model may provide acceleration by scaling out parallel parts of a query (also referred to as a fragment) to additional computing resources, for example computing resources leased from a pool of computing resources. Execution of the parts of the query may be coordinated by a parent query coordinator, where the query originated, and a fragment query coordinator.

    Merge small file consolidation
    15.
    发明授权

    公开(公告)号:US11537613B1

    公开(公告)日:2022-12-27

    申请号:US17514084

    申请日:2021-10-29

    Applicant: Snowflake Inc.

    Abstract: The subject technology receives a query plan corresponding to a query. The subject technology executes the query based at least in part on the query plan, the executing including: filtering a first set of files that are to be modified by a merge statement, performing a split operation to send information related to a second set of files to a scan set builder operation in a first portion of the query plan and scan back operation in a second portion of the query plan, performing the scan set builder operation to remove the second set of files from the first set of files, performing a table scan operation based on a third set of files, and performing a first union all operation to combine the first set of data with a second set of data as a first set of combined data.

    SYSTEM AND METHOD FOR DISJUNCTIVE JOINS

    公开(公告)号:US20220391390A1

    公开(公告)日:2022-12-08

    申请号:US17879615

    申请日:2022-08-02

    Applicant: SNOWFLAKE INC.

    Abstract: Joining data using a disjunctive operator is described. An example computer-implemented method can include generating a query plan for a query, wherein there is a join operator expression for each of a plurality of disjunctive predicates and each join operator expression includes at least a conjunctive predicate and a disjunctive operator. The method may also include generating a bloom filter for each of the plurality of disjunctive operators. The method may further include evaluating each of the plurality of join operator expressions using a corresponding one of the plurality of disjunctive operators and bloom filter for each of the plurality of disjunctive predicates to generate a result set.

    Resource provisioning in database systems

    公开(公告)号:US11514064B2

    公开(公告)日:2022-11-29

    申请号:US17663248

    申请日:2022-05-13

    Applicant: Snowflake Inc.

    Abstract: Resource provisioning systems and methods are described. In an embodiment, a system includes a plurality of shared storage devices collectively storing database data, an execution platform, and a compute service manager. The compute service manager is configured to determine a task to be executed in response to a trigger event and determine a query plan for executing the task, wherein the query plan comprises a plurality of discrete subtasks. The compute service manager is further configured to assign the plurality of discrete subtasks to one or more nodes of a plurality of nodes of the execution platform, determine whether execution of the task is complete, and in response to determining the execution of the task is complete, store a record in the plurality of shared storage devices indicating the task was completed.

    Feature release and workload capture in database systems

    公开(公告)号:US11500838B1

    公开(公告)日:2022-11-15

    申请号:US17869071

    申请日:2022-07-20

    Applicant: Snowflake Inc.

    Abstract: Systems, methods, and devices for feature release and workload capture in database systems are disclosed. The method includes determining a workload based on one or more client queries to be rerun to test a feature that is unreleased to one or more database clients. The method includes repeatedly executing a test run of the workload to determine a stability factor of the test run. The method includes re-executing, in response to determining the stability factor of the test run, the test run using resources with a different concurrency to confirm the stability factor of the test run. The method includes releasing the feature to the one or more database clients in response to confirming the stability factor of the test run.

    Pruning cutoffs for database systems

    公开(公告)号:US11475011B2

    公开(公告)日:2022-10-18

    申请号:US17540945

    申请日:2021-12-02

    Applicant: Snowflake Inc.

    Abstract: The subject technology receives, during a query compilation process, a query directed to a set of source tables, each source table from the set of source tables being organized into at least one micro-partition and the query including at least one pruning operation. The subject technology performs, during the query compilation process, a modification of the query for adjusting the at least one pruning operation, the modification being based at least in part on a set of statistics collected for previous pruning operations on at least a portion of the set of source tables and a set of heuristics. The subject technology compiles the query including the modification of the query. The subject technology provides the compiled query to an execution node of a database system for execution.

    IDENTIFICATION OF OPTIMAL CLOUD RESOURCES FOR EXECUTING WORKLOADS

    公开(公告)号:US20220318215A1

    公开(公告)日:2022-10-06

    申请号:US17842642

    申请日:2022-06-16

    Applicant: SNOWFLAKE INC.

    Abstract: A system to repeatedly execute a test run of a workload using resources of a cloud environment to determine whether there is a performance difference in the test run. The system to, in response to determining that there is no performance difference, identify one or more sets of decreased resources of the cloud environment. The system to re-execute the test run using the one or more sets of decreased resources of the cloud environment to determine whether there is a performance difference in the test run that is attributed to the one or more sets of decreased resources of the cloud environment. The system to determine minimum resources of the cloud environment to repeatedly execute the test run using the minimum resources without existence of a performance difference in response to re-executing the test run using the one or more sets of decreased resources of the cloud environment.

Patent Agency Ranking