-
公开(公告)号:US12117983B2
公开(公告)日:2024-10-15
申请号:US18512028
申请日:2023-11-17
Applicant: Databricks, Inc.
Inventor: Aaron Daniel Davidson , Clemens Mewald , Tomas Nykodym
IPC: G06F16/00 , G06F16/21 , G06F16/955 , G06N5/022
CPC classification number: G06F16/219 , G06F16/955 , G06N5/022
Abstract: A system includes an interface, a processor, and a memory. The interface is configured to receive a version of a model from a model registry. The processor is configured to store the version of the model, start a process running the version of the model, and update a proxy with version information associated with the version of the model, wherein the updated proxy indicates to redirect an indication to invoke the version of the model to the process. The memory is coupled to the processor and configured to provide the processor with instructions.
-
公开(公告)号:US12072880B2
公开(公告)日:2024-08-27
申请号:US17892376
申请日:2022-08-22
Applicant: Databricks, Inc.
Inventor: Prashanth Menon , Alexander Behm , Sriram Krishnamurthy
IPC: G06F9/00 , G06F16/2453 , G06F16/28
CPC classification number: G06F16/24542 , G06F16/285
Abstract: The present application discloses a method, system, and computer system for parsing files. The method includes receiving an indication that a first file is to be processed, determining to begin processing the first file using a first processing engine based at least in part on one or more predefined heuristics, indicating to process the first file using a first processing engine, determining whether a particular error in processing the first file using the first processing engine has been detected, in response to determining that the particular error has been detected, indicate to stop processing the first file using the first processing engine and indicate to continue processing using a second processing engine, and storing in memory information obtained based on processing the first file by one or more of the first processing engine and the second processing engine.
-
公开(公告)号:US20240265011A1
公开(公告)日:2024-08-08
申请号:US18222343
申请日:2023-07-14
Applicant: Databricks, Inc.
Inventor: Saksham Garg , Bogdan Ionut Ghit , Christopher Stevens , Christian Stuart
IPC: G06F16/2453
CPC classification number: G06F16/24539
Abstract: A multi-cluster computing system which includes a query result caching system is presented. The multi-cluster computing system may include a data processing service and client devices communicatively coupled over a network. The data processing service may include a control layer and a data layer. The control layer may be configured to receive and process requests from the client devices and manage resources in the data layer. The data layer may be configured to include instances of clusters of computing resources for executing jobs. The data layer may include a data storage system, which further includes a remote query result cache Store. The query result cache store may include a cloud storage query result cache which stores data associated with results of previously executed requests. As such, when a cluster encounters a previously executed request, the cluster may efficiently retrieve the cached result of the request from the in-memory query result cache or the cloud storage query result cache.
-
公开(公告)号:US20240256550A1
公开(公告)日:2024-08-01
申请号:US18162616
申请日:2023-01-31
Applicant: Databricks, Inc.
Inventor: Utkarsh Agarwal , Shoumik Palkar , Alexander Behm , Sriram Krishnamurthy
IPC: G06F16/2455 , G06F11/34 , G06F16/22
CPC classification number: G06F16/24558 , G06F11/3409 , G06F16/221
Abstract: Disclosed herein is a method, system, or non-transitory computer readable medium for evaluating a query on a columnar dataset comprising one or more dictionaries associated with columns in the dataset. The method includes receiving a request to perform a query comprising at least a operator and a request to return information about a value of interest in a columnar dataset stored on cloud storage. At least one column in the columnar dataset is based on a dictionary. The dictionary maps one or more values for a column to one or more respective identifiers. The method determines whether to perform dictionary filtering for the query by calculating a metric based on one or more factors. Responsive to the metric being below a threshold, which may be predetermined, the method performs the dictionary filtering.
-
公开(公告)号:US20240256549A1
公开(公告)日:2024-08-01
申请号:US18162607
申请日:2023-01-31
Applicant: Databricks, Inc.
Inventor: Utkarsh Agarwal , Shoumik Palkar , Alexander Behm , Sriram Krishnamurthy
IPC: G06F16/2455 , G06F11/34 , G06F16/22
CPC classification number: G06F16/24558 , G06F11/3409 , G06F16/221
Abstract: Disclosed herein is a method, system, or non-transitory computer readable medium for evaluating a query on a columnar dataset comprising one or more dictionaries associated with columns in the dataset. The method includes receiving a request to perform a query comprising at least an operator for a columnar dataset on cloud storage. At least one column in the dataset is based on a dictionary, and the dictionary maps one or more values for a column to one or more respective identifiers. The method evaluates the operator on one or more values of the dictionary to generate an updated dictionary comprising updated values. The method may decode the updated dictionary into an updated column comprising updated data values.
-
公开(公告)号:US20240202211A1
公开(公告)日:2024-06-20
申请号:US18219314
申请日:2023-07-07
Applicant: Databricks, Inc.
Inventor: Alexander Balikov , Tathagata Das , Karthikeyan Ramasamy
IPC: G06F16/27 , G06F16/2455
CPC classification number: G06F16/278 , G06F16/24568
Abstract: A data processing service performs a rebalancing process for rebalancing stateful tasks on a cluster computing system. In one instance, the method for rebalancing stateful tasks is performed such that the per-operator partitions are spread across available executors of a cluster of the cluster computing system with respect to one or more statistics of the tasks. In one instance, the method for rebalancing stateful tasks is also performed such that the total number of stateful tasks are balanced per executor as long as this rebalancing does not imbalance the per-operator placements. In this way, the processing of stateful tasks can be spread across multiple executors in a relatively uniform manner, even though there may be an upfront cost of breaking the local caching on an executor.
-
公开(公告)号:US11948084B1
公开(公告)日:2024-04-02
申请号:US18162291
申请日:2023-01-31
Applicant: Databricks, Inc.
Inventor: Sue Ann Hong , Shi Xin , Timothee Hunter , Ali Ghodsi
CPC classification number: G06N3/08 , G06N3/04 , G06N3/063 , G06N5/022 , G06N5/027 , G06F16/14 , G06F16/22
Abstract: A function creation method is disclosed. The method comprises defining one or more database function inputs, defining cluster processing information, defining a deep learning model, and defining one or more database function outputs. A database function is created based at least in part on the one or more database function inputs, the cluster set-up information, the deep learning model, and the one or more database function outputs. In some embodiments, the database function enables a non-technical user to utilize deep learning models.
-
公开(公告)号:US20240070155A1
公开(公告)日:2024-02-29
申请号:US17895882
申请日:2022-08-25
Applicant: Databricks, Inc.
Inventor: Bart Samwel , Tathagata Das , Lars Kroll , Yijia Cui , Juliusz Sompolski , Tom Van Bussel
IPC: G06F16/2455 , G06F16/22
CPC classification number: G06F16/2456 , G06F16/2282
Abstract: A method, system, and computer system for performing an operation with respect to a target table are disclosed. The method includes performing first and second jobs, and obtaining other resulting files based at least in part on a second set of unmatched rows among the target table and the source table that results from the first set of unmatched rows having been processed in the second job, and obtaining a resulting table based on (i) second job resulting file(s), and (ii) other resulting files. Performing the first job includes determining a set of matching target table files and storing target table information indicating for each of the set of matching target table files, a particular set of rows having matching rows. Performing the second job includes performing a first matching action based on matched rows and a second matching action based on a subset of unmatched rows.
-
公开(公告)号:US20240069863A1
公开(公告)日:2024-02-29
申请号:US17895872
申请日:2022-08-25
Applicant: Databricks, Inc.
Inventor: Bart Samwel , Tathagata Das , Lars Kroll , Yijia Cui , Juliusz Sompolski , Tom Van Bussel
CPC classification number: G06F7/14 , G06F16/148 , G06F16/16
Abstract: A method, system, and computer system for performing an operation with respect to a target table are disclosed. The method includes performing first, second and a third jobs, and obtaining a resulting table based at least in part on the second job resulting file(s) and third job resulting file(s). Performing the first job includes determining a set of matching target table files and storing target table information indicating for each of the set of matching target table files, a particular set of rows having matching rows. Performing the second job includes performing a matching action based on matched rows and obtaining the second job resulting file(s). Performing the third job includes determining unmatched rows for target table files and storing the unmatched rows in third job resulting file(s).
-
公开(公告)号:US20240061840A1
公开(公告)日:2024-02-22
申请号:US18162366
申请日:2023-01-31
Applicant: Databricks, Inc.
Inventor: Prashanth Menon , Alexander Behm , Sriram Krishnamurthy
IPC: G06F16/2453 , G06F16/28
CPC classification number: G06F16/24542 , G06F16/285
Abstract: The present application discloses a method, system, and computer system for parsing files. The method includes receiving an indication that a first file is to be processed, determining to begin processing the first file using a first processing engine based at least in part on one or more predefined heuristics, indicating to process the first file using a first processing engine, determining whether a particular error in processing the first file using the first processing engine has been detected, in response to determining that the particular error has been detected, indicate to stop processing the first file using the first processing engine and indicate to continue processing using a second processing engine, and storing in memory information obtained based on processing the first file by one or more of the first processing engine and the second processing engine.
-
-
-
-
-
-
-
-
-