Patent search caee:"Databricks Inc." Page 8

71.

发明申请
AUTO MAINTENANCE FOR DATA TABLES IN CLOUD STORAGE 有权

公开(公告)号：US20250130981A1

公开(公告)日：2025-04-24

申请号：US18986345

申请日：2024-12-18

Applicant: Databricks, Inc.

Inventor： Vijayan Prabhakaran , Himanshu Raja , Rahul Potharaju , Naga Raju Bhanoori , Lin Ma , Rajesh Parangi Sharabhalingappa , Jintian Liang , Zachary Vaughn Schuermann , Kam Cheung Ting

IPC: G06F16/21 , G06F11/34 , G06F16/22

Abstract: Disclosed is a configuration for managing the organization of data tables in cloud-based storage. The configuration receives metrics for data processing operations on the data table. Metrics include at least one of a size of the data table, a size of each file in the data table, and metadata describing the data table. The configuration automatically executes a cost-benefit analysis based on the one or more metrics for each candidate maintenance operation in a plurality of candidate maintenance operations. The configuration automatically selects a maintenance operation from the candidate maintenance operations to automate based on the cost-benefit analysis of the one or more candidate maintenance operations. The selected maintenance operation is automated and scheduled on the data table.

72.

发明申请
USING LLM FUNCTIONS TO EVALUATE AND COMPARE LARGE TEXT OUTPUTS OF LLMS 有权

公开(公告)号：US20250124236A1

公开(公告)日：2025-04-17

申请号：US18518155

申请日：2023-11-22

Applicant: Databricks, Inc.

Inventor： Ridhima Gupta , Prithvi Kannan , Sunish Sohil Sheth , Kasey Uhlenhuth , Hubert Zub , Corey Zumar

IPC: G06F40/40 , G06F40/103 , G06F40/30

Abstract: A method for evaluating textual output of one or more machine-learned language models is presented. The method includes receiving, from a user of a client device, a first prompt for input to one or more machine-learned language models, providing the first prompt to the one or more models for execution, and receiving a set of generated responses to the first prompt from the one or more models. The method further includes generating a user interface (UI) on the client device displaying the first prompt and generated responses as a table user interface element. The method applies a selected evaluation function to the generated response to evaluate the response with respect to an evaluation objective and identifies words that influence the evaluation. The method generates one or more UI elements on the UI to display the results of the evaluation for the generated responses.

73.

发明授权
Clean room generation for data collaboration and executing clean room task in data processing pipeline 有权

公开(公告)号：US12260003B1

公开(公告)日：2025-03-25

申请号：US18474708

申请日：2023-09-26

Applicant: Databricks, Inc.

Inventor： William Chau , Abhijit Chakankar , Stephen Michael Mahoney , Daniel Seth Morris , Itai Shlomo Weiss

IPC: G06F21/00 , G06F21/62

Abstract: A data processing service facilitates the creation and processing of data processing pipelines that process data processing jobs defined with respect to a set of tasks in a sequence and with data dependencies associated with each separate task such that the output from one task is used as input for a subsequent task. In various embodiments, the set of tasks include at least one cleanroom task that is executed in a cleanroom station and at least one non-cleanroom task executed in an execution environment of a user where each task is configured to read one or more input datasets and transform the one or more input datasets into one or more output datasets.

74.

发明申请
K-D Tree Balanced Splitting 有权

公开(公告)号：US20250086155A1

公开(公告)日：2025-03-13

申请号：US18772758

申请日：2024-07-15

Applicant: Databricks, Inc.

Inventor： Bart Samwel , Prakhar Jain

IPC: G06F16/22 , G06F16/28

Abstract: A system for clustering data into corresponding files comprises one or more processors and a memory. The one or more processors is/are configured to: 1) determine to cluster a set of data into a set of files; 2) determine a set of split points in a corresponding set of dimensions of the set of data to determine the set of files, wherein each file of the set of files has an approximate target size; and 3) store one or more items of the set of data into a corresponding file of the set of files based at least in part on the set of split points. The memory is coupled to the one or more processors and configured to provide the processor with instructions.

75.

发明授权
Clustering key selection based on machine-learned key selection models for data processing service 有权

公开(公告)号：US12229169B1

公开(公告)日：2025-02-18

申请号：US18501830

申请日：2023-11-03

Applicant: Databricks, Inc.

Inventor： Terry Kim , Lin Ma , Rahul Shivu Mahadev , Rahul Potharaju

IPC: G06F16/28 , G06F16/21 , G06F16/22

Abstract: The disclosed configurations provide a method (and/or a computer-readable medium or system) for determining, from a table schema describing keys of a data table, one or more clustering keys that can be used to cluster data files of a data table. The method includes generating features for the data table, generating tokens from the features, generating a prediction for each token by applying to the token a machine-learned transformer model trained to predict a likelihood that the key associated with the token is a clustering key for the data table, determining clustering keys based on the predictions, and clustering data records of the data table into data files based on key-values for the clustering keys.

76.

发明授权
Checkpoint and restore based startup of executor nodes of a distributed computing engine for processing queries 有权

公开(公告)号：US12229137B1

公开(公告)日：2025-02-18

申请号：US18412438

申请日：2024-01-12

Applicant: Databricks, Inc.

Inventor： Xinyang Ge , Lixiang Ao , Haonan Jing , Aaron Daniel Davidson

IPC: G06F16/2453

Abstract: A system performs efficient startup of executors of a distributed computing engine used for processing queries, for example, database queries. The system starts an executor node and processes a set of queries using the executor node to warm up the executor node. The system performs a checkpoint of the warmed-up executor node to create an image. The image is restored in the target executor nodes. The system may store a checkpoint image for each configuration of an executor node. The configuration is determined based on various factors including the hardware of the executor node, memory allocation of the processes, and so on. The user or restore based on checkpoint images improves efficiency of execution of the startup of executor nodes.

77.

发明授权
Multi-cluster query result caching 有权

公开(公告)号：US12189625B2

公开(公告)日：2025-01-07

申请号：US18222343

申请日：2023-07-14

Applicant: Databricks, Inc.

Inventor： Bogdan Ionut Ghit , Saksham Garg , Christian Stuart , Christopher Stevens

IPC: G06F16/24 , G06F16/2453 , G06F16/25 , G06F16/28

Abstract: A multi-cluster computing system which includes a query result caching system is presented. The multi-cluster computing system may include a data processing service and client devices communicatively coupled over a network. The data processing service may include a control layer and a data layer. The control layer may be configured to receive and process requests from the client devices and manage resources in the data layer. The data layer may be configured to include instances of clusters of computing resources for executing jobs. The data layer may include a data storage system, which further includes a remote query result cache Store. The query result cache store may include a cloud storage query result cache which stores data associated with results of previously executed requests. As such, when a cluster encounters a previously executed request, the cluster may efficiently retrieve the cached result of the request from the in-memory query result cache or the cloud storage query result cache.

78.

发明申请
FEATURE FUNCTION BASED COMPUTATION OF ON-DEMAND FEATURES OF MACHINE LEARNING MODELS 有权

公开(公告)号：US20240412095A1

公开(公告)日：2024-12-12

申请号：US18206460

申请日：2023-06-06

Applicant: Databricks, Inc.

Inventor： Matei Zaharia , Avesh Singh , Mani Parkhe , Maxim Lukiyanov , Xiangrui Meng , Aakrati Talati , Chenen Liang , Kasey Uhlenhuth

IPC: G06N20/00

Abstract: A system performs training and execution of machine learning models that use on-demand features using feature functions. The system receives commands for registering metadata associated with a machine learning model. The machine learning model may process a set of features including on-demand features as well as other features such as batch features. The system executes the command by storing an association between the machine learning model and the feature functions associated with any on-demand features processed by the machine learning model. The feature functions are executed using an end point of a data asset service. The use of the data asset service for invoking the feature functions ensures that the same set of instructions is executed during model training and model inferencing, thereby avoiding model skew.

79.

发明申请
AUTO MAINTENANCE FOR DATA TABLES IN CLOUD STORAGE 有权

公开(公告)号：US20240378181A1

公开(公告)日：2024-11-14

申请号：US18144647

申请日：2023-05-08

Applicant: Databricks, Inc.

Inventor： Vijayan Prabhakaran , Himanshu Raja , Rahul Potharaju , Naga Raju Bhanoori , Lin Ma , Rajesh Parangi Sharabhalingappa , Jintian Liang , Zach Schuermann , Kam Cheung Ting

IPC: G06F16/21 , G06F11/34 , G06F16/22

Abstract: Disclosed is a configuration for managing the organization of data tables in cloud-based storage. The configuration receives metrics for data processing operations on the data table. Metrics include at least one of a size of the data table, a size of each file in the data table, and metadata describing the data table. The configuration automatically executes a cost-benefit analysis based on the one or more metrics for each candidate maintenance operation in a plurality of candidate maintenance operations. The configuration automatically selects a maintenance operation from the candidate maintenance operations to automate based on the cost-benefit analysis of the one or more candidate maintenance operations. The selected maintenance operation is automated and scheduled on the data table.

80.

发明公开
RETRIEVAL AND CACHING OF OBJECT METADATA ACROSS DATA SOURCES AND STORAGE SYSTEMS 审中-公开

公开(公告)号：US20240346007A1

公开(公告)日：2024-10-17

申请号：US18135078

申请日：2023-04-14

Applicant: Databricks, Inc.

Inventor： Zhaoxing Li , Rayman Preet Singh , Fuat Can Efeoglu , Daniel Tenedorio , Sarah Cai

IPC: G06F16/23 , G06F16/2455

CPC classification number: G06F16/2365 , G06F16/24552

Abstract: A system for retrieving and caching metadata from a remote data source is described.
The system may receive a request from a client device. The request is to perform a query operation on a set of data objects stored in the remote data source. The system may access a metadata cache storing metadata information on one or more data objects of the remote data source and identify metadata corresponding to the set of data objects for the query operation in the metadata cache. The system may determine whether the identified metadata for the set of data objects meets an update condition. In response to the identified metadata meeting the update condition, the system may fetch updated metadata for at least the set of data objects from the remote data source, and store the updated metadata in the metadata cache.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification