Patent search ap:"Cloudera Page Inc."

1.

发明申请
METHODS AND APPARATUS FOR AN ADAPTIVE AND SERVICE LEVEL AGREEMENT AWARE PAGING SYSTEM 有权

公开(公告)号：US20250103229A1

公开(公告)日：2025-03-27

申请号：US18436810

申请日：2024-02-08

Applicant: Cloudera, Inc.

Inventor： Yida Wu , Abhishek Rawat , Vincent Kulandaisamy

IPC: G06F3/06

Abstract: Examples disclosed herein include writing pages of data to blocks, the data associated with an operator; writing the blocks to a file based on a sequential arrangement of the data in the blocks; writing the file to a spill data store; and executing an instruction by programmable circuitry to batch read the blocks in sequential order from the spill data store to a local memory

2.

发明授权
Database compaction in distributed data system 有权

公开(公告)号：US12169507B2

公开(公告)日：2024-12-17

申请号：US16424083

申请日：2019-05-28

Applicant: Cloudera, Inc.

Inventor： Todd Lipcon

IPC: G06F16/27

Abstract: A compaction policy imposing soft limits to optimize system efficiency is used to select various rowsets on which to perform compaction, each rowset storing keys within an interval called a keyspace. For example, the disclosed compaction policy results in a decrease in a height of the tablet, removes overlapping rowsets, and creates smaller sized rowsets. The compaction policy is based on the linear relationship shared between the keyspace height and the cost associated with performing an operation (e.g., an insert operation) in that keyspace. Accordingly, various factors determining which rowsets are to be compacted, how large the compacted rowsets are to be made, and when to perform the compaction, are considered within the disclosed compaction policy. Furthermore, a system and method for performing compaction on the selected datasets in a log-structured database is also provided.

3.

发明授权
Distinct value estimation for query planning 有权

公开(公告)号：US12105712B2

公开(公告)日：2024-10-01

申请号：US18305715

申请日：2023-04-24

Applicant: Cloudera, Inc.

Inventor： Alexander Behm , Mostafa Mokhtar

IPC: G06F16/2453 , G06F16/2458 , G06F16/835

CPC classification number: G06F16/24545 , G06F16/24547 , G06F16/2471 , G06F16/8373 , G06F16/24549

Abstract: The problem of distinct value estimation has many applications, but is particularly important in the field of database technology where such information is utilized by query planners to generate and optimize query plans. Introduced is a novel technique for estimating the number of distinct values in a given dataset without scanning all of the values in the dataset. In an example embodiment, the introduced technique includes gathering multiple intermediate probabilistic estimates based on varying samples of the dataset, 2) plotting the multiple intermediate probabilistic estimates against indications of sample size, 3) fitting a function to the plotted data points, and 4) determining an overall distinct value estimate by extrapolating the objective function to an estimated or known total number of values in the dataset.

4.

发明公开
INTERACTIVE IDENTIFICATION OF SIMILAR SQL QUERIES 审中-公开

公开(公告)号：US20230350906A1

公开(公告)日：2023-11-02

申请号：US18127322

申请日：2023-03-28

Applicant: Cloudera, Inc.

Inventor： Rituparna Agrawal , Anupam Singh , Prithviraj Pandian

IPC: G06F16/248 , G06F16/84 , G06F16/21 , G06F16/28 , G06F16/2455

CPC classification number: G06F16/248 , G06F16/86 , G06F16/211 , G06F16/285 , G06F16/2455

Abstract: Systems and methods for very fast grouping of “similar” SQL queries according to user-supplied similarity criteria. The user-supplied similarity criteria include a threshold quantifying the degree of similarity between SQL queries and common artifacts included in the queries. A similarity-characterizing data structure allows for the very fast grouping of “similar” SQL queries. Because the computation is distributed among multiple compute nodes, a small cluster of compute nodes takes a short time to compute the similarity-characterizing data on a workload of tens of millions of queries. The user can supply the similarity criteria through a UI or a command line tool. Furthermore, the user can adjust the degree of similarity by supplying new similarity criteria. Accordingly, the system can display in real time or near real time, updated SQL groupings corresponding to the newly supplied similarity criteria using the originally computed similarity-characterizing data structure.

5.

发明公开
DISTINCT VALUE ESTIMATION FOR QUERY PLANNING 审中-公开

公开(公告)号：US20230350894A1

公开(公告)日：2023-11-02

申请号：US18305715

申请日：2023-04-24

Applicant: Cloudera, Inc.

Inventor： Alexander Behm , Mostafa Mokhtar

IPC: G06F16/2453 , G06F16/2458 , G06F16/835

CPC classification number: G06F16/24545 , G06F16/2471 , G06F16/8373 , G06F16/24547 , G06F16/24549

Abstract: The problem of distinct value estimation has many applications, but is particularly important in the field of database technology where such information is utilized by query planners to generate and optimize query plans. Introduced is a novel technique for estimating the number of distinct values in a given dataset without scanning all of the values in the dataset. In an example embodiment, the introduced technique includes gathering multiple intermediate probabilistic estimates based on varying samples of the dataset, 2) plotting the multiple intermediate probabilistic estimates against indications of sample size, 3) fitting a function to the plotted data points, and 4) determining an overall distinct value estimate by extrapolating the objective function to an estimated or known total number of values in the dataset.

6.

发明授权
Design-time information based on run-time artifacts in a distributed computing cluster 有权

公开(公告)号：US11663033B2

公开(公告)日：2023-05-30

申请号：US17179155

申请日：2021-02-18

Applicant: Cloudera, Inc.

Inventor： Vikas Singh , Sudhanshu Arora , Philip Zeyliger , Marcelo Masiero Vanzin , Chang She

IPC: G06F9/46 , G06F16/14 , G06F16/21

CPC classification number: G06F9/46 , G06F16/211 , G06F16/14

Abstract: Techniques are disclosed for inferring design-time information based on run-time artifacts generated by services operating in a distributed computing cluster. In an embodiment, a metadata system extracts metadata including run-time artifacts generated by services in a distributed computing cluster while processing a workflow including multiple jobs. The extracted metadata is processed to identify entities and entity relationships which can then be used to generate lineage information. Using the lineage information, the metadata system can infer design-time information associated with the workflow. The inferred design-time information can then be utilized to, for example, recreate the workflow, recreate previous versions of the workflow, optimize the workflow, etc.

7.

发明授权
Merging multiple sorted lists in a distributed computing system 有权

公开(公告)号：US11301210B2

公开(公告)日：2022-04-12

申请号：US16775141

申请日：2020-01-28

Applicant: Cloudera, Inc.

Inventor： Adar Lieber-Dembo , Todd Lipcon

IPC: G06F7/08 , G06F16/22

Abstract: A technique is described for merging multiple lists of ordinal elements such as keys into a sorted output. In an example embodiment, a merge window is defined, based on the bounds of the multiple lists of ordinal elements, that is representative of a portion of an overall element space associated with the multiple lists. Lists of elements to be sorted can be placed into one of at least two different heaps based on whether they overlap the merge window. For example, lists that overlap the merge window may be placed into an active or “hot” heap, while lists that do not overlap the merge window may be placed into a separate inactive or “cold” heap. A sorted output can then be generated by iteratively processing the active heap. As the processing of the active heap progresses, the merge window advances, and lists may move between the active and inactive heaps.

8.

发明申请
MERGING MULTIPLE SORTED LISTS IN A DISTRIBUTED COMPUTING SYSTEM 有权

公开(公告)号：US20210141602A1

公开(公告)日：2021-05-13

申请号：US16775141

申请日：2020-01-28

Applicant: Cloudera, Inc.

Inventor： Adar Lieber-Dembo , Todd Lipcon

IPC: G06F7/08 , G06F16/22

Abstract: A technique is described for merging multiple lists of ordinal elements such as keys into a sorted output. In an example embodiment, a merge window is defined, based on the bounds of the multiple lists of ordinal elements, that is representative of a portion of an overall element space associated with the multiple lists. Lists of elements to be sorted can be placed into one of at least two different heaps based on whether they overlap the merge window. For example, lists that overlap the merge window may be placed into an active or “hot” heap, while lists that do not overlap the merge window may be placed into a separate inactive or “cold” heap. A sorted output can then be generated by iteratively processing the active heap. As the processing of the active heap progresses, the merge window advances, and lists may move between the active and inactive heaps.

9.

发明申请
ENSURING PROPERLY ORDERED EVENTS IN A DISTRIBUTED COMPUTING ENVIRONMENT 审中-公开

公开(公告)号：US20200304610A1

公开(公告)日：2020-09-24

申请号：US16895947

申请日：2020-06-08

Applicant: Cloudera, Inc.

Inventor： David Alves , Todd Lipcon

IPC: H04L29/06 , G06Q10/00 , H04L29/08 , G06Q50/26 , G06F1/14

Abstract: A first event occurs at a first computer at a first time, as measured by a local clock. A second event is initiated at a second computer by sending a message that includes the first time. The second event occurs at a second time, as measured by a local clock. Because of clock error, the first time is later than the second time. Based on the first time being later than the second time, an alternate second time, that is based on the first time, is used as the time of the second event. When a third system determines the order of the two events, the first time is obtained from the first computer, and the alternate second time is obtained from the second computer, and the order of the events is determined based on a comparison of the two times.

10.

发明申请
DESIGN-TIME INFORMATION BASED ON RUN-TIME ARTIFACTS IN A DISTRIBUTED COMPUTING CLUSTER 审中-公开

公开(公告)号：US20200065136A1

公开(公告)日：2020-02-27

申请号：US16667609

申请日：2019-10-29

Applicant: Cloudera, Inc.

Inventor： Vikas Singh , Sudhanshu Arora , Philip Zeyliger , Marcelo Masiero Vanzin , Chang She

IPC: G06F9/46 , G06F16/21

Abstract: Techniques are disclosed for inferring design-time information based on run-time artifacts generated by services operating in a distributed computing cluster. In an embodiment, a metadata system extracts metadata including run-time artifacts generated by services in a distributed computing cluster while processing a workflow including multiple jobs. The extracted metadata is processed to identify entities and entity relationships which can then be used to generate lineage information. Using the lineage information, the metadata system can infer design-time information associated with the workflow. The inferred design-time information can then be utilized to, for example, recreate the workflow, recreate previous versions of the workflow, optimize the workflow, etc.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification