-
公开(公告)号:US20250165477A1
公开(公告)日:2025-05-22
申请号:US18511902
申请日:2023-11-16
Applicant: Databricks, Inc.
Inventor: Michael Paul Armbrust , Alexander Balikov , Boyang Peng
IPC: G06F16/2455 , G06F9/48 , G06F16/2453
Abstract: A database system performs pipelined execution of queries that process batches of streaming data. The database system compiles a database query to generate an execution plan and determines a set of stages based on the execution plan. The database query processes streaming data comprising batches. A scheduler schedules pipelined execution stages of the database query. Accordingly, the database system performs execution of a particular stage processing a batch of the streaming data in parallel with subsequent stages of the database query processing previous batches of the streaming data. The system further maintains watermarks for different stages of the database query.
-
12.
公开(公告)号:US20250156394A1
公开(公告)日:2025-05-15
申请号:US18985397
申请日:2024-12-18
Applicant: Databricks, Inc.
Inventor: Michael Paul Armbrust , Shixiong Zhu , Burak Yavuz
Abstract: A system includes an interface and a processor. The interface is configured to receive a table indication of a data table and to receive a transaction indication to perform a transaction. The processor is configured to determine a current position N in a transaction log; determine a current state of the metadata; determine a read set associated with a transaction; attempt to write an update to the transaction log associated with a next position N+1; in response to a transaction determination that a simultaneous transaction associated with the next position N+1 already exists, determine a set of updated files; and in response to a determination that there is not an overlap between the read set associated with the current transaction and the set of updated files associated with the simultaneous transaction, attempt to write the update to the transaction to the transaction log associated with a further position N+2.
-
13.
公开(公告)号:US20230394029A1
公开(公告)日:2023-12-07
申请号:US18236516
申请日:2023-08-22
Applicant: Databricks, Inc.
Inventor: Michael Paul Armbrust , Shixiong Zhu , Burak Yavuz
CPC classification number: G06F16/2358 , G06F16/148 , G06F16/2282
Abstract: A system includes an interface and a processor. The interface is configured to receive a table indication of a data table and to receive a transaction indication to perform a transaction. The processor is configured to determine a current position N in a transaction log; determine a current state of the metadata; determine a read set associated with a transaction; attempt to write an update to the transaction log associated with a next position N+1; in response to a transaction determination that a simultaneous transaction associated with the next position N+1 already exists, determine a set of updated files; and in response to a determination that there is not an overlap between the read set associated with the current transaction and the set of updated files associated with the simultaneous transaction, attempt to write the update to the transaction to the transaction log associated with a further position N+2.
-
公开(公告)号:US11567998B2
公开(公告)日:2023-01-31
申请号:US17362450
申请日:2021-06-29
Applicant: Databricks, Inc.
Inventor: Michael Paul Armbrust , Andreas Neumann , Mukul Murthy , Jonathan Mio
IPC: G06F16/901 , G06F16/245 , G06F16/22
Abstract: A system for dataflow graph processing comprises a communication interface and a processor. The communication interface is configured receive an indication to generate a dataflow graph, wherein the indication includes a set of queries and/or commands. The processor is coupled to the communication interface and configured to: determine dependencies of each query in the set of queries on another query; determine a DAG of nodes based at least in part on the dependencies; determine the dataflow graph by determining in-line expressions for tables of the dataflow graph aggregating calculations associated with a subset of dataflow graph nodes designated as view nodes; and provide the dataflow graph.
-
15.
公开(公告)号:US20220253424A1
公开(公告)日:2022-08-11
申请号:US17695411
申请日:2022-03-15
Applicant: Databricks Inc.
Inventor: Michael Paul Armbrust , Shixiong Zhu , Burak Yavuz
Abstract: A system includes an interface and a processor. The interface is configured to receive a table indication of a data table and to receive a transaction indication to perform a transaction. The processor is configured to determine a current position N in a transaction log; determine a current state of the metadata; determine a read set associated with a transaction; attempt to write an update to the transaction log associated with a next position N+1; in response to a transaction determination that a simultaneous transaction associated with the next position N+1 already exists, determine a set of updated files; and in response to a determination that there is not an overlap between the read set associated with the current transaction and the set of updated files associated with the simultaneous transaction, attempt to write the update to the transaction to the transaction log associated with a further position N+2.
-
16.
公开(公告)号:US11308071B2
公开(公告)日:2022-04-19
申请号:US16941227
申请日:2020-07-28
Applicant: Databricks Inc.
Inventor: Michael Paul Armbrust , Shixiong Zhu , Burak Yavuz
Abstract: A system includes an interface and a processor. The interface is configured to receive a table indication of a data table and to receive a transaction indication to perform a transaction. The processor is configured to determine a current position N in a transaction log; determine a current state of the metadata; determine a read set associated with a transaction; attempt to write an update to the transaction log associated with a next position N+1; in response to a transaction determination that a simultaneous transaction associated with the next position N+1 already exists, determine a set of updated files; and in response to a determination that there is not an overlap between the read set associated with the current transaction and the set of updated files associated with the simultaneous transaction, attempt to write the update to the transaction to the transaction log associated with a further position N+2.
-
17.
公开(公告)号:US12189607B2
公开(公告)日:2025-01-07
申请号:US18236516
申请日:2023-08-22
Applicant: Databricks, Inc.
Inventor: Michael Paul Armbrust , Shixiong Zhu , Burak Yavuz
Abstract: A system includes an interface and a processor. The interface is configured to receive a table indication of a data table and to receive a transaction indication to perform a transaction. The processor is configured to determine a current position N in a transaction log, determine a current state of the metadata; determine a read set associated with a transaction; attempt to write an update to the transaction log associated with a next position N+1; in response to a transaction determination that a simultaneous transaction associated with the next position N+1 already exists, determine a set of updated files; and in response to a determination that there is not an overlap between the read set associated with the current transaction and the set of updated files associated with the simultaneous transaction, attempt to write the update to the transaction to the transaction log associated with a further position N+2.
-
公开(公告)号:US20250005076A1
公开(公告)日:2025-01-02
申请号:US18658418
申请日:2024-05-08
Applicant: Databricks, Inc.
Inventor: Michael Paul Armbrust , Andreas Neumann , Mukul Murthy , Jonathan Mio
IPC: G06F16/901 , G06F16/215 , G06F16/22 , G06F16/245
Abstract: A system for dataflow graph processing comprises a communication interface and a processor. The communication interface is configured receive an indication to generate a dataflow graph, wherein the indication includes a set of queries. The processor is coupled to the communication interface and is configured to: determine dependencies of each query in the set of queries on another query; determine a DAG of nodes based at least in part on the dependencies; insert a node in the DAG of nodes to generate an updated DAG to enforce an expectation; determine a dataflow graph based on the updated DAG; and provide the dataflow graph.
-
公开(公告)号:US12182292B1
公开(公告)日:2024-12-31
申请号:US18162353
申请日:2023-01-31
Applicant: Databricks, Inc.
Inventor: Matei Zaharia , Shixiong Zhu , Xiaotong Sun , Ramesh Chandra , Michael Paul Armbrust , Ali Ghodsi
Abstract: The present application discloses a method, system, and computer system for providing access to data. The method includes receiving, by a data manager service from a data requesting service, a request using an identifier for a high-level data object to access a set of data associated with the high-level data object, determining, by the data manager service, low-level data object(s) corresponding to the set of data based on the identifier for the high-level data object, determining whether a user associated with the request has permission to access at least a subset of the low-level data object(s), and in response to determining that the user associated has permission to access the at least the subset of the low-level data object(s), generating, by the data manager service, a uniform resource locator (URL) via which the at least the subset of the one or more low-level data objects is accessible by the user.
-
公开(公告)号:US12147555B1
公开(公告)日:2024-11-19
申请号:US17733485
申请日:2022-04-29
Applicant: Databricks, Inc.
Inventor: Matei Zaharia , Shixiong Zhu , Xiaotong Sun , Ramesh Chandra , Michael Paul Armbrust , Ali Ghodsi
Abstract: The present application discloses a method, system, and computer system for providing access to data. The method includes receiving, by a data manager service from a data requesting service, a request using an identifier for a high-level data object to access a set of data associated with the high-level data object, determining, by the data manager service, low-level data object(s) corresponding to the set of data based on the identifier for the high-level data object, determining whether a user associated with the request has permission to access at least a subset of the low-level data object(s), and in response to determining that the user associated has permission to access the at least the subset of the low-level data object(s), generating, by the data manager service, a uniform resource locator (URL) via which the at least the subset of the one or more low-level data objects is accessible by the user.
-
-
-
-
-
-
-
-
-