Managing Real Time Data Stream Processing

    公开(公告)号:US20230070710A1

    公开(公告)日:2023-03-09

    申请号:US18054632

    申请日:2022-11-11

    Applicant: Google LLC

    Abstract: A method for managing data processing includes receiving, from a user of a data query system, a data query for data stored in a data store in communication with the data query system. The method also includes receiving a staleness parameter indicating an upper time boundary for the data query. The upper time boundary limits a query response to data within the data store that is older than the upper time boundary. The method further includes determining whether the data stored within the data store satisfies the staleness parameter. When a portion of the data within the data store fails to satisfy the staleness parameter, the method includes generating the query response that excludes the portion of the data that fails to satisfy the staleness parameter.

    Shuffle-less Reclustering of Clustered Tables

    公开(公告)号:US20220374455A1

    公开(公告)日:2022-11-24

    申请号:US17817147

    申请日:2022-08-03

    Applicant: Google LLC

    Abstract: A method for shuffle-less reclustering of clustered tables includes receiving a first and second group of clustered data blocks sorted by a clustering key value. A range of clustering key values of one or more the data blocks in the second group overlaps with the range of clustering key values of a data block in the first group. The method also includes generating split points for partitioning the first and second groups of clustered data blocks into a third group. The method also includes partitioning using the split points, the first and second groups into the third group. Each data block in the third group includes a range of clustering key values that do not overlap with any other data block in the third group. Each split point defines an upper limit or lower limit for the range of clustering key values a data block in the third group.

    Execution-time dynamic range partitioning transformations

    公开(公告)号:US11423049B2

    公开(公告)日:2022-08-23

    申请号:US16872238

    申请日:2020-05-11

    Applicant: Google LLC

    Abstract: A method for execution-time dynamic range partitioning includes receiving user data including a partitioning key and a clustering key. The user data includes a respective number of total rows defining a total data size for the user data. The method also includes identifying storage constraints for the data storage system. The storage constraints include a target file size and a target number of rows per file. The method further includes determining a plurality of split points for the user data based on the storage constraints. The method also includes generating partitioning quantiles from the plurality of split points that define a range between each split point of the plurality of split points. The method further includes range partitioning each row of the user data into files using the partitioning quantiles.

    Metadata Management for a Transactional Storage System

    公开(公告)号:US20210382892A1

    公开(公告)日:2021-12-09

    申请号:US17445422

    申请日:2021-08-19

    Applicant: Google LLC

    Inventor: Pavan Edara Yang Yi

    Abstract: A method for managing metadata for a transactional storage system include receiving a query request at a snapshot timestamp. The query request requests return of at least one data block from a plurality of data blocks. Each data block includes a corresponding write epoch timestamp and a corresponding conversion indicator indicating whether the data block is active or has been converted at a respective conversion timestamp. The method also includes setting a read epoch timestamp equal to the earliest one of the write epoch and determining whether any of the respective conversion timestamps occurring at or before the snapshot timestamp occur after the read epoch timestamp. The method also includes determining the at least one data block requested by the query request by scanning each of the data blocks including corresponding write epoch timestamps occurring at or after the read epoch timestamp.

    Managing Real Time Data Stream Processing

    公开(公告)号:US20210319031A1

    公开(公告)日:2021-10-14

    申请号:US16848833

    申请日:2020-04-14

    Applicant: Google LLC

    Abstract: A method for managing data processing includes receiving, from a user of a data query system, a data query for data stored in a data store in communication with the data query system. The method also includes receiving a staleness parameter indicating an upper time boundary for the data query. The upper time boundary limits a query response to data within the data store that is older than the upper time boundary. The method further includes determining whether the data stored within the data store satisfies the staleness parameter. When a portion of the data within the data store fails to satisfy the staleness parameter, the method includes generating the query response that excludes the portion of the data that fails to satisfy the staleness parameter.

    Metadata management for a transactional storage system

    公开(公告)号:US11113296B1

    公开(公告)日:2021-09-07

    申请号:US16848780

    申请日:2020-04-14

    Applicant: Google LLC

    Inventor: Pavan Edara Yang Yi

    Abstract: A method for managing metadata for a transactional storage system include receiving a query request at a snapshot timestamp. The query request requests return of at least one data block from a plurality of data blocks. Each data block includes a corresponding write epoch timestamp and a corresponding conversion indicator indicating whether the data block is active or has been converted at a respective conversion timestamp. The method also includes setting a read epoch timestamp equal to the earliest one of the write epoch and determining whether any of the respective conversion timestamps occurring at or before the snapshot timestamp occur after the read epoch timestamp. The method also includes determining the at least one data block requested by the query request by scanning each of the data blocks including corresponding write epoch timestamps occurring at or after the read epoch timestamp.

Patent Agency Ranking