PIPELINE LEVEL OPTIMIZATION OF AGGREGATION OPERATORS IN A QUERY PLAN DURING RUNTIME

    公开(公告)号:US20210089535A1

    公开(公告)日:2021-03-25

    申请号:US16857817

    申请日:2020-04-24

    Applicant: Snowflake Inc.

    Abstract: The subject technology receives a query plan, the query plan comprising a set of query operations, the set of query operations including at least one aggregation and a join operation, the join operation including a build side and a probe side. The subject technology inserts an aggregation operator below the probe side of the join operation. The subject technology causes the build side of the join operation to generate a hash table. The subject technology causes the build side of the join operation to generate a bloom filter based at least in part on the hash table and provide information, corresponding to properties of the build side, to a bloom filter. Based at least in part on the information, the subject technology determines at least one property of the join operation to determine whether to switch the aggregation operator to a pass through mode.

    Reclustering of database tables based on peaks and widths

    公开(公告)号:US10956394B2

    公开(公告)日:2021-03-23

    申请号:US16941215

    申请日:2020-07-28

    Applicant: Snowflake Inc.

    Abstract: The subject technology determines whether a table is sufficiently clustered. The subject technology in response to determining the table is not sufficiently clustered, selects one or more micro-partitions of the table to be reclustered. The subject technology constructs a data structure for the table. The subject technology extracts minimum and maximum endpoints for each micro-partition in the data structure. The subject technology sorts each of one or more peaks in the data structure based on height. The subject technology sorts overlapping micro-partitions based on width. The subject technology selects based on which micro-partitions are within the tallest peaks of the one or more peaks and further based on which of the overlapping micro-partitions have the widest widths.

    Flexible computing
    33.
    发明授权

    公开(公告)号:US10860381B1

    公开(公告)日:2020-12-08

    申请号:US16874388

    申请日:2020-05-14

    Applicant: Snowflake Inc.

    Abstract: Embodiments of the present disclosure may provide dynamic and fair assignment techniques for allocating resources on a demand basis. Assignment control may be separated into at least two components: a local component and a global component. Each component may have an active dialog with each other; the dialog may include two aspects: 1) a demand for computing resources, and 2) a total allowed number of computing resources. The global component may allocate resources from a pool of resources to different local components, and the local components in turn may assign their allocated resources to local competing requests. The allocation may also be throttled or limited at various levels.

    HIDDEN DATABASE OBJECTS OVER EXTERNAL DATA

    公开(公告)号:US20240427794A1

    公开(公告)日:2024-12-26

    申请号:US18513140

    申请日:2023-11-17

    Applicant: Snowflake Inc.

    Abstract: The subject technology provides techniques for enabling hidden database objects, which in an example are utilized for testing and verifying new database objects against existing workloads. Hidden database objects are a mechanism for bridging that gap by running user workloads on user data in advance of exposing the feature to users. This mechanism allows placing a database object as a hidden object nested beneath a user's visible object (e.g., table, column, view, and the like). Hidden database objects can be used to verify the functionality, parity, performance, and correctness of new unreleased features.

    UNIFIED STRUCTURED AND SEMI-STRUCTURED DATA TYPES IN DATABASE SYSTEMS

    公开(公告)号:US20240427790A1

    公开(公告)日:2024-12-26

    申请号:US18497746

    申请日:2023-10-30

    Applicant: Snowflake Inc.

    Abstract: The subject technology receives a query, the query referencing a unified representation for structured type data and semi-structured type data, the unified representation being provided in storage and in memory during query processing, the unified representation comprising a set of structured type fields that include a set of semi-structured typed fields that enables type safety and enforcement for the set of structured type fields, and flexibility for the set of semi-structured typed fields in a same column, the unified representation in storage including type information for the semi-structured type data as part of the semi-structured type data, the unified representation being utilized for structured type data and semi-structured type data. The subject technology processes the query using the unified representation stored in the memory, the unified representation providing performance parity between structured type data and semi-structured type data.

    Multi-phase query plan caching
    37.
    发明授权

    公开(公告)号:US12135715B1

    公开(公告)日:2024-11-05

    申请号:US18309490

    申请日:2023-04-28

    Applicant: Snowflake Inc.

    Abstract: The subject technology receives a query, the query including a statement for performing the query. The subject technology performs a first lookup operation on a multi-phase cache based on the query. The subject technology performs, in response to a first cache miss of the multi-phase cache, parsing of the statement from the query. The subject technology performs, based on the parsing, a compilation process on the query to generate a compiled query plan, the compilation process determining an optimization and a generalization for the query. The subject technology determines that the compiled query plan is cacheable. The subject technology registers, in response to the compiled query plan being cacheable, a dummy entry in the multi-phase cache.

    Transient materialized view rewrite

    公开(公告)号:US12026159B2

    公开(公告)日:2024-07-02

    申请号:US18059125

    申请日:2022-11-28

    Applicant: Snowflake Inc.

    CPC classification number: G06F16/24542

    Abstract: Queries executed against a materialized view can execute up to orders of magnitude faster than equivalent queries on a source (or base) table. However, although a query can reference a materialized view directly, a user (e.g., query author) may not know about a relevant materialized view. Moreover, if a source table has multiple materialized views generated, the user may not know which materialized view to reference in the query. Thus, embodiments of the present disclosure provide techniques for automatically rewriting queries directed to a source table to utilize existing materialized views.

    Performance indexing of production databases

    公开(公告)号:US12026140B1

    公开(公告)日:2024-07-02

    申请号:US18112198

    申请日:2023-02-21

    Applicant: Snowflake Inc.

    CPC classification number: G06F16/2228 G06F16/254

    Abstract: Methods, systems, and computer programs are presented for providing performance metrics in an online performance analysis system employing customer production workloads. A plurality of metric source data is received from a cloud data platform. A workload is identified as a stable workload candidate based at least in part on the plurality of metric source data. The cloud data platform generates a performance index based on the workload being identified as a stable workload candidate. The performance index is tracked over a period of time to identify changes in workload.

Patent Agency Ranking