Patent search ap:"Databricks Page Inc."

81.

发明授权
Short query prioritization for data processing service 有权

公开(公告)号：US12210521B2

公开(公告)日：2025-01-28

申请号：US18140323

申请日：2023-04-27

Applicant: Databricks, Inc.

Inventor： Venkata Sai Akhil Gudesa , Herman Rudolf Petrus Catharina van Hövell tot Westerflier , Supun Chathuranga Nakandala

IPC: G06F16/24 , G06F9/48 , G06F11/34 , G06F16/2453 , G06F16/28

Abstract: A cluster computing system maintains a first set of queues for short queries and a set second set for longer queries. The first set is allocated a majority of the cluster's processing resources and processes queries on a first in first out basis. The second set is allocated a minority of the cluster's processing resources which are shared among queries in the second set. Accordingly, the system assigns each query to the first set of queues for a fixed amount of resource time. While a query is processing, the system monitors the query's resource time and reassigns the query to the second set of queues if the query has not completed within the allotted amount of resource time. Thus, short queries receive the necessary resources to complete quickly without getting stuck behind longer queries while ensuring that longer queries continue to make progress.

82.

发明授权
Auto maintenance for data tables in cloud storage 有权

公开(公告)号：US12204510B2

公开(公告)日：2025-01-21

申请号：US18144647

申请日：2023-05-08

Applicant: Databricks, Inc.

Inventor： Vijayan Prabhakaran , Himanshu Raja , Rahul Potharaju , Naga Raju Bhanoori , Lin Ma , Rajesh Parangi Sharabhalingappa , Jintian Liang , Zachary Vaughn Schuermann , Kam Cheung Ting

IPC: G06F16/21 , G06F11/34 , G06F16/22

Abstract: Disclosed is a configuration for managing the organization of data tables in cloud-based storage. The configuration receives metrics for data processing operations on the data table. Metrics include at least one of a size of the data table, a size of each file in the data table, and metadata describing the data table. The configuration automatically executes a cost-benefit analysis based on the one or more metrics for each candidate maintenance operation in a plurality of candidate maintenance operations. The configuration automatically selects a maintenance operation from the candidate maintenance operations to automate based on the cost-benefit analysis of the one or more candidate maintenance operations. The selected maintenance operation is automated and scheduled on the data table.

83.

发明授权
Clean room generation for data collaboration 有权

公开(公告)号：US12197400B1

公开(公告)日：2025-01-14

申请号：US18473992

申请日：2023-09-25

Applicant: Databricks, Inc.

Inventor： William Chau , Abhijit Chakankar , Stephen Michael Mahoney , Daniel Seth Morris , Itai Shlomo Weiss

IPC: G06F16/00 , G06F16/21

Abstract: A data processing service receives a request from a first collaborator to create a clean room for data sharing collaboration with at least a second collaborator. In response, the data processing service creates an execution environment separate from the data environment of the first collaborator and the second collaborator. The first and second collaborators can then add content into the clean room in the form of data tables and executable notebooks. Approval from each collaborator is required before a notebook can be executed using any data table shared into the clean room. Upon receiving notebook approval from each collaborator, the data processing service creates a notebook job to execute the notebook on one or more cluster computing resources of the data processing service to generate an output.

84.

发明授权
Scan parsing 有权

公开(公告)号：US12189628B2

公开(公告)日：2025-01-07

申请号：US18162366

申请日：2023-01-31

Applicant: Databricks, Inc.

Inventor： Prashanth Menon , Alexander Behm , Sriram Krishnamurthy

IPC: G06F16/00 , G06F16/2453 , G06F16/28

Abstract: The present application discloses a method, system, and computer system for parsing files. The method includes receiving an indication that a first file is to be processed, determining to begin processing the first file using a first processing engine based at least in part on one or more predefined heuristics, indicating to process the first file using a first processing engine, determining whether a particular error in processing the first file using the first processing engine has been detected, in response to determining that the particular error has been detected, indicate to stop processing the first file using the first processing engine and indicate to continue processing using a second processing engine, and storing in memory information obtained based on processing the first file by one or more of the first processing engine and the second processing engine.

85.

发明授权
Update and query of a large collection of files that represent a single dataset stored on a blob store 有权

公开(公告)号：US12189607B2

公开(公告)日：2025-01-07

申请号：US18236516

申请日：2023-08-22

Applicant: Databricks, Inc.

Inventor： Michael Paul Armbrust , Shixiong Zhu , Burak Yavuz

IPC: G06F16/14 , G06F16/22 , G06F16/23

Abstract: A system includes an interface and a processor. The interface is configured to receive a table indication of a data table and to receive a transaction indication to perform a transaction. The processor is configured to determine a current position N in a transaction log, determine a current state of the metadata; determine a read set associated with a transaction; attempt to write an update to the transaction log associated with a next position N+1; in response to a transaction determination that a simultaneous transaction associated with the next position N+1 already exists, determine a set of updated files; and in response to a determination that there is not an overlap between the read set associated with the current transaction and the set of updated files associated with the simultaneous transaction, attempt to write the update to the transaction to the transaction log associated with a further position N+2.

86.

发明申请
Dataflow Graph Processing with Expectations 有权

公开(公告)号：US20250005076A1

公开(公告)日：2025-01-02

申请号：US18658418

申请日：2024-05-08

Applicant: Databricks, Inc.

Inventor： Michael Paul Armbrust , Andreas Neumann , Mukul Murthy , Jonathan Mio

IPC: G06F16/901 , G06F16/215 , G06F16/22 , G06F16/245

Abstract: A system for dataflow graph processing comprises a communication interface and a processor. The communication interface is configured receive an indication to generate a dataflow graph, wherein the indication includes a set of queries. The processor is coupled to the communication interface and is configured to: determine dependencies of each query in the set of queries on another query; determine a DAG of nodes based at least in part on the dependencies; insert a node in the DAG of nodes to generate an updated DAG to enforce an expectation; determine a dataflow graph based on the updated DAG; and provide the dataflow graph.

87.

发明授权
Data sharing for network connected systems 有权

公开(公告)号：US12182292B1

公开(公告)日：2024-12-31

申请号：US18162353

申请日：2023-01-31

Applicant: Databricks, Inc.

Inventor： Matei Zaharia , Shixiong Zhu , Xiaotong Sun , Ramesh Chandra , Michael Paul Armbrust , Ali Ghodsi

IPC: G06F21/62 , G06F21/00 , G06F21/60

Abstract: The present application discloses a method, system, and computer system for providing access to data. The method includes receiving, by a data manager service from a data requesting service, a request using an identifier for a high-level data object to access a set of data associated with the high-level data object, determining, by the data manager service, low-level data object(s) corresponding to the set of data based on the identifier for the high-level data object, determining whether a user associated with the request has permission to access at least a subset of the low-level data object(s), and in response to determining that the user associated has permission to access the at least the subset of the low-level data object(s), generating, by the data manager service, a uniform resource locator (URL) via which the at least the subset of the one or more low-level data objects is accessible by the user.

88.

发明申请
Fetching Query Results Through Cloud Object Stores 有权

公开(公告)号：US20240394271A1

公开(公告)日：2024-11-28

申请号：US18614380

申请日：2024-03-22

Applicant: Databricks, Inc.

Inventor： Bogdan Ionut Ghit , Juliusz Sompolski , Shi Xin , Bart Samwel

IPC: G06F16/2458 , G06F11/34 , G06F16/242 , G06F16/25

Abstract: The system is configured to: 1) receive a client request; 2) determine executor(s) to generate a response to the user request; 3) provide each of the executor(s) with an indication; 4) receive for each indication a response including an output of either a cloud output or an in-line output to generate a group of in-line outputs and a group of cloud outputs; 5) determine whether the group of in-line outputs comprises all outputs; and 6) in response to the group of in-line outputs not comprising all the outputs for the client request: a) convert the group of in-line outputs to a converted group of cloud outputs; b) generate metadata for the converted group of cloud outputs and the group of cloud outputs; and c) provide response to the client request including the metadata for the converted group of cloud outputs and the group of cloud outputs.

89.

发明授权
Data sharing for network connected systems 有权

公开(公告)号：US12147555B1

公开(公告)日：2024-11-19

申请号：US17733485

申请日：2022-04-29

Applicant: Databricks, Inc.

Inventor： Matei Zaharia , Shixiong Zhu , Xiaotong Sun , Ramesh Chandra , Michael Paul Armbrust , Ali Ghodsi

IPC: G06F21/62 , G06F21/00 , G06F21/60

Abstract: The present application discloses a method, system, and computer system for providing access to data. The method includes receiving, by a data manager service from a data requesting service, a request using an identifier for a high-level data object to access a set of data associated with the high-level data object, determining, by the data manager service, low-level data object(s) corresponding to the set of data based on the identifier for the high-level data object, determining whether a user associated with the request has permission to access at least a subset of the low-level data object(s), and in response to determining that the user associated has permission to access the at least the subset of the low-level data object(s), generating, by the data manager service, a uniform resource locator (URL) via which the at least the subset of the one or more low-level data objects is accessible by the user.

90.

发明授权
Adaptive approach to lazy materialization in database scans using pushed filters 有权

公开(公告)号：US12124450B2

公开(公告)日：2024-10-22

申请号：US18160861

申请日：2023-01-27

Applicant: Databricks, Inc.

Inventor： Shoumik Palkar , Alexander Behm , Mostafa Mokhtar , Sriram Krishnamurthy

IPC: G06F16/2453 , G06F11/34 , G06F16/22

CPC classification number: G06F16/24545 , G06F11/3409 , G06F16/221

Abstract: Disclosed herein is a method for determining whether to apply a lazy materialization technique to a query run. A data processing service receives a request to perform a query identifying a filter column and a non-filter column in a columnar database. The data processing service accesses a first task of contiguous rows in the filter column from a cloud-based object storage. The data processing service applies a filter defined by the query to the first task. The data processing service generates filter results for the first task that may include a percentage of the first task discarded and a run-time. The data processing service determines, based on the filter results for the first task, a likelihood value that indicates a likelihood of gaining a performance benefit by applying the lazy materialization technique to a second task of the query.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification