-
公开(公告)号:US20230177072A1
公开(公告)日:2023-06-08
申请号:US18162625
申请日:2023-01-31
Applicant: Databricks, Inc.
Inventor: Mani Parkhe , Clemens Mewald , Matei Zaharia , Avesh Singh
CPC classification number: G06F16/288 , G06F30/27
Abstract: The present application discloses a method, system, and computer system for managing a plurality of features and storing lineage information pertaining to the features. The method includes obtaining one or more datasets, determining a first feature, wherein the first feature is determined based at least in part on the one or more datasets, and storing the first feature in a feature store. The first feature is stored in association with a dataset indication of the one or more datasets from which the first feature is determined. The feature store comprises a plurality of features.
-
公开(公告)号:US20230161767A1
公开(公告)日:2023-05-25
申请号:US18158258
申请日:2023-01-23
Applicant: Databricks, Inc.
Inventor: Shi Xin , Alexander Behm , Shoumik Palkar , Herman Rudolf Petrus Catharina van Hovell tot Westerflier
IPC: G06F16/2453 , G06F16/2458 , G06F16/25
CPC classification number: G06F16/24542 , G06F16/258 , G06F16/2471
Abstract: A system comprises an interface, a processor, and a memory. The interface is configured to receive a query. The processor is configured to: determine a set of nodes for the query; determine whether a node of the set of nodes comprises a first engine node type or a second engine node type, wherein determining whether the node of the set of nodes comprises the first engine node type or the second engine node type is based at least in part on determining whether the node is able to be executed in a second engine; and generate a plan based at least in part on the set of nodes. The memory is coupled to the processor and is configured to provide the processor with instructions.
-
公开(公告)号:US20220374532A1
公开(公告)日:2022-11-24
申请号:US17514982
申请日:2021-10-29
Applicant: Databricks Inc.
Inventor: Matei Zaharia , David Lewis , Cheng Lian , Yuchen Huo , Ali Ghodsi
Abstract: The present application discloses a method, system, and computer system for providing access to information stored on system for data storage. The method includes receiving a data request from a user, determining data corresponding to the data request, determining whether the user has requisite permissions to access the data, and in response to determining that the user has requisite permissions to access the data: determining a manner by which to provide access to the data, wherein the data comprises a filtered subset of stored data, and generating a token based at least in part on the user and the manner by which access to the data is to be provided.
-
14.
公开(公告)号:US10769130B1
公开(公告)日:2020-09-08
申请号:US15987215
申请日:2018-05-23
Applicant: Databricks Inc.
Inventor: Michael Paul Armbrust , Shixiong Zhu , Burak Yavuz
Abstract: A system includes an interface and a processor. The interface is configured to receive a table indication of a data table and to receive a transaction indication to perform a transaction. The processor is configured to determine a current position N in a transaction log; determine a current state of the metadata; determine a read set associated with a transaction; attempt to write an update to the transaction log associated with a next position N+1; in response to a transaction determination that a simultaneous transaction associated with the next position N+1 already exists, determine a set of updated files; and in response to a determination that there is not an overlap between the read set associated with the current transaction and the set of updated files associated with the simultaneous transaction, attempt to write the update to the transaction to the transaction log associated with a further position N+2.
-
公开(公告)号:US20250165324A1
公开(公告)日:2025-05-22
申请号:US19030032
申请日:2025-01-17
Applicant: Databricks, Inc.
Inventor: Alicja Luszczak , Srinath Shankar , Shi Xin
Abstract: A system for monitoring job execution includes an interface and a processor. The interface is configured to receive an indication to start a cluster processing job. The processor is configured to determine whether processing a data instance associated with the cluster processing job satisfies a watchdog criterion; and in the event that processing the data instance satisfies the watchdog criterion, cause the processing of the data instance to be killed.
-
16.
公开(公告)号:US20250156448A1
公开(公告)日:2025-05-15
申请号:US19022884
申请日:2025-01-15
Applicant: Databricks, Inc.
Inventor: Terry Kim , Lin Ma , Rahul Shivu Mahadev , Rahul Potharaju
Abstract: The disclosed configurations provide a method (and/or a computer-readable medium or system) for determining, from a table schema describing keys of a data table, one or more clustering keys that can be used to cluster data files of a data table. The method includes generating features for the data table, generating tokens from the features, generating a prediction for each token by applying to the token a machine-learned transformer model trained to predict a likelihood that the key associated with the token is a clustering key for the data table, determining clustering keys based on the predictions, and clustering data records of the data table into data files based on key-values for the clustering keys.
-
公开(公告)号:US20250156397A1
公开(公告)日:2025-05-15
申请号:US18983280
申请日:2024-12-16
Applicant: Databricks, Inc.
Inventor: Zhaoxing Li , Rayman Preet Singh , Fuat Can Efeoglu , Daniel Tenedorio , Sarah Cai
IPC: G06F16/23 , G06F16/2455
Abstract: A system for retrieving and caching metadata from a remote data source is described. The system may receive a request from a client device. The request is to perform a query operation on a set of data objects stored in the remote data source. The system may access a metadata cache storing metadata information on one or more data objects of the remote data source and identify metadata corresponding to the set of data objects for the query operation in the metadata cache. The system may determine whether the identified metadata for the set of data objects meets an update condition. In response to the identified metadata meeting the update condition, the system may fetch updated metadata for at least the set of data objects from the remote data source, and store the updated metadata in the metadata cache.
-
公开(公告)号:US12277237B2
公开(公告)日:2025-04-15
申请号:US17514982
申请日:2021-10-29
Applicant: Databricks, Inc.
Inventor: Matei Zaharia , David Lewis , Cheng Lian , Yuchen Huo , Ali Ghodsi
Abstract: The present application discloses a method, system, and computer system for providing access to information stored on system for data storage. The method includes receiving a data request from a user, determining data corresponding to the data request, determining whether the user has requisite permissions to access the data, and in response to determining that the user has requisite permissions to access the data: determining a manner by which to provide access to the data, wherein the data comprises a filtered subset of stored data, and generating a token based at least in part on the user and the manner by which access to the data is to be provided.
-
公开(公告)号:US20250094195A1
公开(公告)日:2025-03-20
申请号:US18368919
申请日:2023-09-15
Applicant: Databricks, Inc.
Inventor: Aaron Daniel Davidson , Thomas Garnier , Lin Guo , Zhe He , Manlin Li , Yang Liu , Feng Wang , Hong Zhang , Weirong Zhu
Abstract: A resource management configuration may receive an API request from an API server. The API request specifies task information from a plurality of tenants. The configuration transmits status information of a plurality of VMs to the API server to assign tasks to one or more VMs based on the task information and the status information. Tasks assigned to a VM of the plurality of VMs are for one tenant of the plurality of tenants. The configuration configures on an untrusted network, network security groups for managing communications of tenants such that a network security group configured for a tenant permits communications between VMs assigned to the same tenant but prevents communications between VMs assigned to different tenants. The configuration pins each assigned VM of the one or more assigned VMs to perform the task based on the task information of the corresponding tenant.
-
公开(公告)号:US20250061378A1
公开(公告)日:2025-02-20
申请号:US18738025
申请日:2024-06-09
Applicant: Databricks, Inc.
Inventor: Benjamin Thomas Wilson , Corey Zumar
IPC: G06N20/00 , G06F18/20 , G06F18/2132
Abstract: The present application discloses a method, system, and computer system for building a model associated with a dataset. The method includes receiving a data set, the dataset comprising a plurality of keys and a plurality of key-value relationships, determining a plurality of models to build based at least in part on the dataset, wherein determining the plurality of models to build comprises using the dataset format information to identify the plurality of models, building the plurality of models, and optimizing at least one of the plurality of models.
-
-
-
-
-
-
-
-
-