-
公开(公告)号:US11860907B2
公开(公告)日:2024-01-02
申请号:US17817147
申请日:2022-08-03
Applicant: Google LLC
Inventor: Hua Zhang , Pavan Edara , Nhan Nguyen
CPC classification number: G06F16/285 , G06F21/64
Abstract: A method for shuffle-less reclustering of clustered tables includes receiving a first and second group of clustered data blocks sorted by a clustering key value. A range of clustering key values of one or more the data blocks in the second group overlaps with the range of clustering key values of a data block in the first group. The method also includes generating split points for partitioning the first and second groups of clustered data blocks into a third group. The method also includes partitioning using the split points, the first and second groups into the third group. Each data block in the third group includes a range of clustering key values that do not overlap with any other data block in the third group. Each split point defines an upper limit or lower limit for the range of clustering key values a data block in the third group.
-
公开(公告)号:US11609909B2
公开(公告)日:2023-03-21
申请号:US17315281
申请日:2021-05-08
Applicant: Google LLC
Inventor: Pavan Edara , Jordan Tigani
IPC: G06F7/00 , G06F16/2453 , G06F16/22 , G06F16/242
Abstract: A computer-implemented method includes receiving a query specifying an operation to perform on a first table of a plurality of data blocks stored. Each data block in the first table includes a respective reference count indicating a number of tables referencing the data block. The method also includes determining that the operation specified by the query includes copying the plurality of data blocks in the first table into a second table and, in response, for each data block of the plurality of data blocks in the first table copied into the second table, incrementing, the respective reference count associated with the data block in the first table, appending, by the data processing hardware, into metadata of the second table, a reference of the corresponding data block copied into the second table.
-
公开(公告)号:US11580123B2
公开(公告)日:2023-02-14
申请号:US17098301
申请日:2020-11-13
Applicant: Google LLC
Inventor: Pavan Edara , Mosha Pasumansky
IPC: G06F16/2458 , G06F16/22 , G06F12/0875
Abstract: A method for managing big metadata using columnar techniques includes receiving a query request requesting data blocks from a data table that match query parameters. The data table is associated with system tables that each includes metadata for a corresponding data block of the data table. The method includes generating, based on the query request, a system query to return a subset of rows that correspond to the data blocks that match the query parameters. The method further includes generating, based on the query request and the system query, a final query to return a subset of data blocks from the data table corresponding to the subset of rows. The method also includes determining whether any of the data blocks in the subset of data blocks match the query parameters, and returning the matching data blocks when one or more data blocks match the query parameters.
-
公开(公告)号:US11520796B2
公开(公告)日:2022-12-06
申请号:US16848833
申请日:2020-04-14
Applicant: Google LLC
Inventor: Pavan Edara , Jonathan Forbes , Yang Yi
IPC: G06F16/22 , G06F16/2457 , G06F16/2458 , G06F16/23 , G06F16/248
Abstract: A method for managing data processing includes receiving, from a user of a data query system, a data query for data stored in a data store in communication with the data query system. The method also includes receiving a staleness parameter indicating an upper time boundary for the data query. The upper time boundary limits a query response to data within the data store that is older than the upper time boundary. The method further includes determining whether the data stored within the data store satisfies the staleness parameter. When a portion of the data within the data store fails to satisfy the staleness parameter, the method includes generating the query response that excludes the portion of the data that fails to satisfy the staleness parameter.
-
公开(公告)号:US20220156260A1
公开(公告)日:2022-05-19
申请号:US17098301
申请日:2020-11-13
Applicant: Google LLC
Inventor: Pavan Edara , Mosha Pasumansky
IPC: G06F16/2458 , G06F16/22 , G06F12/0875
Abstract: A method for managing big metadata using columnar techniques includes receiving a query request requesting data blocks from a data table that match query parameters. The data table is associated with system tables that each includes metadata for a corresponding data block of the data table. The method includes generating, based on the query request, a system query to return a subset of rows that correspond to the data blocks that match the query parameters. The method further includes generating, based on the query request and the system query, a final query to return a subset of data blocks from the data table corresponding to the subset of rows. The method also includes determining whether any of the data blocks in the subset of data blocks match the query parameters, and returning the matching data blocks when one or more data blocks match the query parameters.
-
公开(公告)号:US12259800B2
公开(公告)日:2025-03-25
申请号:US18391229
申请日:2023-12-20
Applicant: Google LLC
Inventor: Pavan Edara , Reuven Lax , Ji Yang , Gurpreet Singh Nanda
Abstract: A method for processing data exactly once using transactional stream writes includes receiving, from a client, a batch of data blocks for storage on memory hardware in communication with the data processing hardware. The batch of data blocks is associated with a corresponding sequence number and represents a number of rows of a table stored on the memory hardware. The method also includes partitioning the batch of data blocks into a plurality of sub-batches of data blocks. For each sub-batch of data blocks, the method further includes assigning the sub-batch of data blocks to a buffered stream; writing, using the assigned buffered stream, the sub-batch of data blocks to the memory hardware; updating a storage log with an intent to commit the sub-batch of data blocks using the assigned buffered stream; and committing the sub-batch of data blocks to the memory hardware.
-
公开(公告)号:US12026168B2
公开(公告)日:2024-07-02
申请号:US18166056
申请日:2023-02-08
Applicant: Google LLC
Inventor: Pavan Edara , Mosha Pasumansky
IPC: G06F16/2458 , G06F12/0875 , G06F16/22
CPC classification number: G06F16/2465 , G06F12/0875 , G06F16/221 , G06F2212/45
Abstract: A method for managing big metadata using columnar techniques includes receiving a query request requesting data blocks from a data table that match query parameters. The data table is associated with system tables that each includes metadata for a corresponding data block of the data table. The method includes generating, based on the query request, a system query to return a subset of rows that correspond to the data blocks that match the query parameters. The method further includes generating, based on the query request and the system query, a final query to return a subset of data blocks from the data table corresponding to the subset of rows. The method also includes determining whether any of the data blocks in the subset of data blocks match the query parameters, and returning the matching data blocks when one or more data blocks match the query parameters.
-
公开(公告)号:US20230185816A1
公开(公告)日:2023-06-15
申请号:US18166056
申请日:2023-02-08
Applicant: Google LLC
Inventor: Pavan Edara , Mosha Pasumansky
IPC: G06F16/2458 , G06F16/22 , G06F12/0875
CPC classification number: G06F16/2465 , G06F16/221 , G06F12/0875 , G06F2212/45
Abstract: A method for managing big metadata using columnar techniques includes receiving a query request requesting data blocks from a data table that match query parameters. The data table is associated with system tables that each includes metadata for a corresponding data block of the data table. The method includes generating, based on the query request, a system query to return a subset of rows that correspond to the data blocks that match the query parameters. The method further includes generating, based on the query request and the system query, a final query to return a subset of data blocks from the data table corresponding to the subset of rows. The method also includes determining whether any of the data blocks in the subset of data blocks match the query parameters, and returning the matching data blocks when one or more data blocks match the query parameters.
-
公开(公告)号:US20220365914A1
公开(公告)日:2022-11-17
申请号:US17876660
申请日:2022-07-29
Applicant: Google LLC
Inventor: Pavlo Padinker , Pavan Edara , Bigang Li
IPC: G06F16/215 , G06F16/22 , G06F16/23 , G06F12/0804
Abstract: The present disclosure describes a service which provides primary in-line deduplication. A streaming application program interface (API) may allow for streaming records into a storage system with high throughput and low latency. As part of this process, the API allows user to add identifiers as a field used for data deduplication. The deduplication service keeps a moving window of the identifiers in memory and does in-line deduplication by quickly determining whether data is a duplicate. Keeping only deduplication keys in memory reduces the cost of running the service. Moreover, the real-time nature of the moving window approach allows for storing deduplication information alongside the data and accessing it immediately on read. In this regard, read after write consistency is supported, and costs are reduced.
-
公开(公告)号:US11436261B2
公开(公告)日:2022-09-06
申请号:US16848810
申请日:2020-04-14
Applicant: Google LLC
Inventor: Hua Zhang , Pavan Edara , Nhan Nguyen
Abstract: A method for shuffle-less reclustering of clustered tables includes receiving a first and second group of clustered data blocks sorted by a clustering key value. A range of clustering key values of one or more the data blocks in the second group overlaps with the range of clustering key values of a data block in the first group. The method also includes generating split points for partitioning the first and second groups of clustered data blocks into a third group. The method also includes partitioning using the split points, the first and second groups into the third group. Each data block in the third group includes a range of clustering key values that do not overlap with any other data block in the third group. Each split point defines an upper limit or lower limit for the range of clustering key values a data block in the third group.
-
-
-
-
-
-
-
-
-