Patent search ap:"Cloudera Page Inc."

21.

发明授权
Accumulating and flushing mutations in a column store 有权

公开(公告)号：US12222915B2

公开(公告)日：2025-02-11

申请号：US17314813

申请日：2021-05-07

Applicant: Cloudera, Inc.

Inventor： Todd Lipcon

IPC: G06F16/22 , G06F16/23

Abstract: Columnar storage provides many performance and space saving benefits for analytic workloads, but previous mechanisms for handling single row update transactions in column stores suffer from poor performance. A columnar data layout facilitates both low-latency random access capabilities together with high-throughput analytical access capabilities, simplifying Hadoop architectures for use cases involving real-time data. In disclosed embodiments, mutations within a single row are executed atomically across columns and do not necessarily include the entirety of a row. This allows for faster updates without the overhead of reading or rewriting larger columns.

22.

发明公开
ENSURING PROPERLY ORDERED EVENTS IN A DISTRIBUTED COMPUTING ENVIRONMENT 审中-公开

公开(公告)号：US20240015234A1

公开(公告)日：2024-01-11

申请号：US18357021

申请日：2023-07-21

Applicant: Cloudera, Inc.

Inventor： David Alves , Todd Lipcon

IPC: H04L69/28 , G06Q10/00 , H04L67/10 , G06Q50/26 , G06F1/14 , G06F11/07

CPC classification number: H04L69/28 , G06Q10/00 , H04L67/10 , G06Q50/26 , G06F1/14 , G06F11/0721 , G06F11/0772

Abstract: A first event occurs at a first computer at a first time, as measured by a local clock. A second event is initiated at a second computer by sending a message that includes the first time. The second event occurs at a second time, as measured by a local clock. Because of clock error, the first time is later than the second time. Based on the first time being later than the second time, an alternate second time, that is based on the first time, is used as the time of the second event. When a third system determines the order of the two events, the first time is obtained from the first computer, and the alternate second time is obtained from the second computer, and the order of the events is determined based on a comparison of the two times.

23.

发明授权
Apparatus and method for accelerated query processing using eager aggregation and analytical view matching 有权

公开(公告)号：US11341134B2

公开(公告)日：2022-05-24

申请号：US16989687

申请日：2020-08-10

Applicant: Cloudera, Inc.

Inventor： Anjali Betawadkar-Norwood , Priyank Patel

IPC: G06F16/2453 , G06F16/2458 , G06F16/27

Abstract: A system comprises a computer network and worker machines connected to the computer network. The worker machines store partitions of a distributed database. A master machine is connected to the computer network. The master machine includes a query processor to identify a star query that references a fact table and related dimension tables that characterize attributes of facts in the fact table. Eager aggregation is applied to a query plan associated with the star query. The eager aggregation alters the query plan by moving an aggregation operation before a join operation to form an eager aggregated query plan. An analytical view with data responsive to the eager aggregated query plan is identified. The eager aggregated query plan is revised to form a final query plan. The final query plan references the analytical view. The final query plan is executed to produce query results.

24.

发明申请
UTILIZATION-AWARE RESOURCE SCHEDULING IN A DISTRIBUTED COMPUTING CLUSTER 有权

公开(公告)号：US20210349755A1

公开(公告)日：2021-11-11

申请号：US17379742

申请日：2021-07-19

Applicant: Cloudera, Inc.

Inventor： Karthik Kambatla

IPC: G06F9/48 , G06F9/50

Abstract: Embodiments are disclosed for a utilization-aware approach to cluster scheduling, to address this resource fragmentation and to improve cluster utilization and job throughput. In some embodiments a resource manager at a master node considers actual usage of running tasks and schedules opportunistic work on underutilized worker nodes. The resource manager monitors resource usage on these nodes and preempts opportunistic containers in the event this over-subscription becomes untenable. In doing so, the resource manager effectively utilizes wasted resources, while minimizing adverse effects on regularly scheduled tasks.

25.

发明申请
DESIGN-TIME INFORMATION BASED ON RUN-TIME ARTIFACTS IN TRANSIENT CLOUD-BASED DISTRIBUTED COMPUTING CLUSTERS 有权

公开(公告)号：US20210334301A1

公开(公告)日：2021-10-28

申请号：US17367194

申请日：2021-07-02

Applicant: Cloudera, Inc.

Inventor： Sudhanshu Arora , Mark Donsky , Guang Yao Leng , Naren Koneru , Chang She , Vikas Singh , Himabindu Vuppula

IPC: G06F16/34 , G06N5/04 , G06F9/455 , G06F16/38 , G06F16/28

Abstract: Transient computing clusters can be temporarily provisioned in cloud-based infrastructure to run data processing tasks. Such tasks may be run by services operating in the clusters that consume and produce data including operational metadata. Techniques are introduced for tracking data lineage across multiple clusters, including transient computing clusters, based on the operational metadata. In some embodiments, operational metadata is extracted from the transient computing clusters and aggregated at a metadata system for analysis. Based on the analysis of the metadata, operations can be summarized at a cluster level even if the transient computing cluster no longer exists. Further relationships between workflows, such as dependencies or redundancies, can be identified and utilized to optimize the provisioning of computing clusters and tasks performed by the computing clusters.

26.

发明授权
Apparatus and method for utilizing pre-computed results for query processing in a distributed database 有权

公开(公告)号：US11151135B1

公开(公告)日：2021-10-19

申请号：US15230240

申请日：2016-08-05

Applicant: Cloudera, Inc.

Inventor： Douglas J. Cameron

IPC: G06F16/178 , G06F16/2453 , G06F16/951

Abstract: A pre-computed result module computes a result prior to receiving a query. The pre-computed result module includes instructions executed by a processor to assess a pre-computation query to designate each identified database source that contributes to the answer to the pre-computation query and corresponding database source metadata. A metadata signature is computed for each identified database source to create a store of identified database sources and corresponding metadata signatures. The query is evaluated to identify accessed database sources responsive to the query. A current metadata signature for each accessed database source is compared to the metadata signatures to identify each updated database source. Re-computed results are formed for each updated database source. Pre-computed results are utilized for each database source that is not updated. A response is supplied to the query using the re-computed results and the pre-computed results.

27.

发明授权
Apparatus and method for processing streaming data and forming visualizations thereof 有权

公开(公告)号：US11108661B1

公开(公告)日：2021-08-31

申请号：US16280397

申请日：2019-02-20

Applicant: Cloudera, Inc.

Inventor： Charu Anchlia , Sushil Thomas

IPC: G06F15/173 , H04L12/26 , G06F16/9538 , H04L29/06 , H04L12/24

Abstract: A machine has a bus and a network interface circuit to receive different data streams from a network. The network interface circuit is connected to the network and the bus. A processor is connected to the bus. A memory is connected to the bus. The memory stores instructions executed by the processor to continuously increment aggregate functions associated with data parameters within the different data streams. Visualizations of the different data streams are periodically updated on different client devices connected to the network.

28.

发明授权
Information based on run-time artifacts in a distributed computing cluster 有权

公开(公告)号：US10514948B2

公开(公告)日：2019-12-24

申请号：US15808805

申请日：2017-11-09

Applicant: Cloudera, Inc.

Inventor： Vikas Singh , Sudhanshu Arora , Philip Zeyliger , Marcelo Masiero Vanzin , Chang She

IPC: G06F9/46 , G06F16/14 , G06F16/21

Abstract: Techniques are disclosed for inferring design-time information based on run-time artifacts generated by services operating in a distributed computing cluster. In an embodiment, a metadata system extracts metadata including run-time artifacts generated by services in a distributed computing cluster while processing a workflow including multiple jobs. The extracted metadata is processed to identify entities and entity relationships which can then be used to generate lineage information. Using the lineage information, the metadata system can infer design-time information associated with the workflow. The inferred design-time information can then be utilized to, for example, recreate the workflow, recreate previous versions of the workflow, optimize the workflow, etc.

29.

发明申请
COMPACTION POLICY 审中-公开

公开(公告)号：US20190278783A1

公开(公告)日：2019-09-12

申请号：US16424083

申请日：2019-05-28

Applicant: Cloudera, Inc.

Inventor： Todd Lipcon

IPC: G06F16/27

Abstract: A compaction policy imposing soft limits to optimize system efficiency is used to select various rowsets on which to perform compaction, each rowset storing keys within an interval called a keyspace. For example, the disclosed compaction policy results in a decrease in a height of the tablet, removes overlapping rowsets, and creates smaller sized rowsets. The compaction policy is based on the linear relationship shared between the keyspace height and the cost associated with performing an operation (e.g., an insert operation) in that keyspace. Accordingly, various factors determining which rowsets are to be compacted, how large the compacted rowsets are to be made, and when to perform the compaction, are considered within the disclosed compaction policy. Furthermore, a system and method for performing compaction on the selected datasets in a log-structured database is also provided.

30.

发明授权
Ensuring properly ordered events in a distributed computing environment 有权

公开(公告)号：US10171635B2

公开(公告)日：2019-01-01

申请号：US14462445

申请日：2014-08-18

Applicant: Cloudera, Inc.

Inventor： David Alves , Todd Lipcon

IPC: H04L29/08 , H04L29/06 , G06Q10/00 , G06F1/14 , G06Q50/26

Abstract: A first event occurs at a first computer at a first time, as measured by a local clock. A second event is initiated at a second computer by sending a message that includes the first time. The second event occurs at a second time, as measured by a local clock. Because of clock error, the first time is later than the second time. Based on the first time being later than the second time, an alternate second time, that is based on the first time, is used as the time of the second event. When a third system determines the order of the two events, the first time is obtained from the first computer, and the alternate second time is obtained from the second computer, and the order of the events is determined based on a comparison of the two times.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification