Independent data processing environments within a big data cluster system

    公开(公告)号:US09959337B2

    公开(公告)日:2018-05-01

    申请号:US15485952

    申请日:2017-04-12

    CPC classification number: G06F17/30598 G06F9/5033 G06F9/5072 G06F2209/505

    Abstract: A cluster system includes an interface and a processor. The interface is to receive a request from a user associated with one of a plurality of shells. The processor is to determine a plurality of tasks to respond to the request; determine a local set of data and a shared set of data for a task of the plurality of tasks, wherein the local set of data is associated with the one of the plurality of shells; and provide the task, a local set indication, and a shared set indication to a worker associated with the task, wherein the local set indication refers to the local set of data and the shared set indication refers to the shared set of data.

    System for exploring data in a database

    公开(公告)号:US09760602B1

    公开(公告)日:2017-09-12

    申请号:US14621950

    申请日:2015-02-13

    CPC classification number: G06F17/30424 G06F17/30389

    Abstract: A system for exploring data in a database comprises a query parser, a parameter manager, a query submitter, and a result formatter. The query parser is to receive a base query and determine an input parameter from the base query. The parameter manager is to provide a first request for a value for the input parameter; receive the value for the input parameter; and provide a second request for the value for the input parameter. The query submitter is to determine a first query using the base query and the value for the input parameter; and provide an indication to execute the first query. The result formatter is to receive a result associated with the indication to execute the first query.

    Independent data processing environments within a big data cluster system

    公开(公告)号:US09659081B1

    公开(公告)日:2017-05-23

    申请号:US14824989

    申请日:2015-08-12

    CPC classification number: G06F17/30598 G06F9/5033 G06F9/5072 G06F2209/505

    Abstract: A cluster system includes an interface and a processor. The interface is to receive a request from a user associated with one of a plurality of shells. The processor is to determine a plurality of tasks to respond to the request; determine a local set of data and a shared set of data for a task of the plurality of tasks, wherein the local set of data is associated with the one of the plurality of shells; and provide the task, a local set indication, and a shared set indication to a worker associated with the task, wherein the local set indication refers to the local set of data and the shared set indication refers to the shared set of data.

    SHORT QUERY PRIORITIZATION FOR DATA PROCESSING SERVICE

    公开(公告)号:US20250156414A1

    公开(公告)日:2025-05-15

    申请号:US18991083

    申请日:2024-12-20

    Abstract: A cluster computing system maintains a first set of queues for short queries and a set second set for longer queries. The first set is allocated a majority of the cluster's processing resources and processes queries on a first in first out basis. The second set is allocated a minority of the cluster's processing resources which are shared among queries in the second set. Accordingly, the system assigns each query to the first set of queues for a fixed amount of resource time. While a query is processing, the system monitors the query's resource time and reassigns the query to the second set of queues if the query has not completed within the allotted amount of resource time. Thus, short queries receive the necessary resources to complete quickly without getting stuck behind longer queries while ensuring that longer queries continue to make progress.

    CONCURRENT OPTIMISTIC TRANSACTIONS FOR TABLES WITH DELETION VECTORS

    公开(公告)号:US20250103580A1

    公开(公告)日:2025-03-27

    申请号:US18928982

    申请日:2024-10-28

    Abstract: A disclosed configuration receives a first indication that a first transaction is committed to update a first subset of records in a data table at a first version to generate a second version of the data table and receiving a second indication to commit a second transaction to update a second subset of records in a data file of the data table at the first version. The configuration determines a logical prerequisite based on whether the first subset of records changes content of one or more records in the second subset of records and determining a physical prerequisite on whether the second subset of records corresponds to respective data records in data files of the second version of the data table. The configuration commits the second transaction to generate a third version of the data table by updating elements of the deletion vector if the prerequisites are satisfied.

    Reducing cluster start up time
    77.
    发明授权

    公开(公告)号:US12248818B1

    公开(公告)日:2025-03-11

    申请号:US17514988

    申请日:2021-10-29

    Abstract: The present application discloses a method, system, and computer system for starting up and maintaining a cluster in a warmed up state, and/or allocating clusters from a warmed up state. The method includes instantiating a set of virtual machines, wherein instantiating the set of virtual machines includes setting a temporary security credential for each virtual machine of the set of virtual machines, receiving a virtual machine allocation request associated with a workspace, a customer, or a tenant, in response to the virtual machine allocation request: allocating a virtual machine, wherein allocating the virtual machine comprises replacing the temporary security credential with a security credential associated with the workspace, the customer, or the tenant.

    Dictionary filtering and evaluation in columnar databases

    公开(公告)号:US12242485B2

    公开(公告)日:2025-03-04

    申请号:US18162616

    申请日:2023-01-31

    Abstract: Disclosed herein is a method, system, or non-transitory computer readable medium for evaluating a query on a columnar dataset comprising one or more dictionaries associated with columns in the dataset. The method includes receiving a request to perform a query comprising at least a operator and a request to return information about a value of interest in a columnar dataset stored on cloud storage. At least one column in the columnar dataset is based on a dictionary. The dictionary maps one or more values for a column to one or more respective identifiers. The method determines whether to perform dictionary filtering for the query by calculating a metric based on one or more factors. Responsive to the metric being below a threshold, which may be predetermined, the method performs the dictionary filtering.

    Data lineage tracking
    79.
    发明授权

    公开(公告)号:US12242441B1

    公开(公告)日:2025-03-04

    申请号:US18162562

    申请日:2023-01-31

    Abstract: The present application discloses a method, system, and computer system for managing lineage data for data entities. The method includes generating lineage data, wherein generating the lineage data, and storing and indexing, in a data structure, the lineage data in association with the selected data entity. The generating the lineage data includes selecting a selected data entity, obtaining a query tree that was used to generate the selected data entity, and determining lineage data for the selected data entity based at least in part on the query tree.

    Evaluating expressions over dictionary data

    公开(公告)号:US12210528B2

    公开(公告)日:2025-01-28

    申请号:US18162607

    申请日:2023-01-31

    Abstract: Disclosed herein is a method, system, or non-transitory computer readable medium for evaluating a query on a columnar dataset comprising one or more dictionaries associated with columns in the dataset. The method includes receiving a request to perform a query comprising at least an operator for a columnar dataset on cloud storage. At least one column in the dataset is based on a dictionary, and the dictionary maps one or more values for a column to one or more respective identifiers. The method evaluates the operator on one or more values of the dictionary to generate an updated dictionary comprising updated values. The method may decode the updated dictionary into an updated column comprising updated data values.

Patent Agency Ranking