PROCESSING QUERIES ON SEMI-STRUCTURED DATA COLUMNS

    公开(公告)号:US20220207041A1

    公开(公告)日:2022-06-30

    申请号:US17655124

    申请日:2022-03-16

    Applicant: Snowflake Inc.

    Abstract: A source table organized into a set of batch units is accessed. The source table comprises a column of data corresponding to a semi-structured data type. One or more indexing transformations for an object in the column are generated. The generating of the one or more indexing transformation includes converting the object to one or more stored data types. A pruning index is generated for the source table based in part on the one or more indexing transformations. The pruning index comprises a set of filters that index distinct values in each column of the source table, and each filter corresponds to a batch unit in the set of batch units. The pruning index is stored in a database with an association with the source table.

    Scalable query processing
    113.
    发明授权

    公开(公告)号:US11347735B2

    公开(公告)日:2022-05-31

    申请号:US16889033

    申请日:2020-06-01

    Applicant: Snowflake Inc.

    Abstract: Embodiments of the present disclosure may provide a dynamic query execution model. This query execution model may provide acceleration by scaling out parallel parts of a query (also referred to as a fragment) to additional computing resources, for example computing resources leased from a pool of computing resources. Execution of the parts of the query may be coordinated by a parent query coordinator, where the query originated, and a fragment query coordinator.

    Pruning index to support semi-structured data types

    公开(公告)号:US11308090B2

    公开(公告)日:2022-04-19

    申请号:US17394149

    申请日:2021-08-04

    Applicant: Snowflake Inc.

    Abstract: A source table organized into a set of batch units is accessed. The source table comprises a column of data corresponding to a semi-structured data type. One or more indexing transformations for an object in the column are generated. The generating of the one or more indexing transformation includes converting the object to one or more stored data types. A pruning index is generated for the source table based in part on the one or more indexing transformations. The pruning index comprises a set of filters that index distinct values in each column of the source table, and each filter corresponds to a batch unit in the set of batch units. The pruning index is stored in a database with an association with the source table.

    Processing of queries over external tables

    公开(公告)号:US11269869B2

    公开(公告)日:2022-03-08

    申请号:US17498382

    申请日:2021-10-11

    Applicant: Snowflake Inc.

    Abstract: Disclosed herein are systems and methods for processing queries over external tables. In an embodiment, a database platform receives a query directed at least to data in an external table stored in a storage platform that is external to the database platform. The database platform uses metadata that summarizes the data in the external table to identify one or more partitions of the external table as potentially including data satisfying the query, and generates a query plan that includes a plurality of discrete subtasks that collectively include instructions to scan the identified one or more partitions of the external table for data satisfying the query. The database platform assigns, based on the metadata, the plurality of discrete subtasks to one or more nodes in an execution platform, and refreshes the metadata in response to a threshold number of modifications being made to the external table.

    INCREMENTAL RECLUSTERING OF DATABASE TABLES USING RECLUSTERING-COUNT LEVELS

    公开(公告)号:US20220067016A1

    公开(公告)日:2022-03-03

    申请号:US17511064

    申请日:2021-10-26

    Applicant: Snowflake Inc.

    Abstract: The subject technology determines whether a table is sufficiently clustered. The subject technology in response to determining the table is not sufficiently clustered, selects one or more micro-partitions of the table to be reclustered. The subject technology constructs a data structure for the table. The subject technology extracts minimum and maximum endpoints for each micro-partition in the data structure. The subject technology sorts each of one or more peaks in the data structure based on height. The subject technology sorts overlapping micro-partitions based on width. The subject technology selects based on which micro-partitions are within the tallest peaks of the one or more peaks and further based on which of the overlapping micro-partitions have the widest widths.

Patent Agency Ranking