CENTRALIZED CONFIGURATION OF A DISTRIBUTED COMPUTING CLUSTER
    41.
    发明申请
    CENTRALIZED CONFIGURATION OF A DISTRIBUTED COMPUTING CLUSTER 有权
    分布式计算集群的集中配置

    公开(公告)号:US20150039735A1

    公开(公告)日:2015-02-05

    申请号:US14509300

    申请日:2014-10-08

    Applicant: Cloudera, Inc.

    Abstract: Systems and methods for centralized configuration of a distributed computing cluster are disclosed. One embodiment of the disclosed technology provides a user environment that facilitates a selection of a service to be run on hosts in the distributed computing cluster and configuration of the service or hosts in the distributed computer cluster. The disclosed technology can further configure each of the hosts in the distributed computing cluster to run the service based on a set of configuration settings.

    Abstract translation: 公开了用于集中式配置分布式计算集群的系统和方法。 所公开技术的一个实施例提供了便于选择要在分布式计算群集中的主机上运行的服务以及分布式计算机群集中的服务或主机的配置的用户环境。 所公开的技术可以进一步配置分布式计算集群中的每个主机以基于一组配置设置来运行服务。

    Ensuring properly ordered events in a distributed computing environment

    公开(公告)号:US12255978B2

    公开(公告)日:2025-03-18

    申请号:US18357021

    申请日:2023-07-21

    Applicant: Cloudera, Inc.

    Abstract: A first event occurs at a first computer at a first time, as measured by a local clock. A second event is initiated at a second computer by sending a message that includes the first time. The second event occurs at a second time, as measured by a local clock. Because of clock error, the first time is later than the second time. Based on the first time being later than the second time, an alternate second time, that is based on the first time, is used as the time of the second event. When a third system determines the order of the two events, the first time is obtained from the first computer, and the alternate second time is obtained from the second computer, and the order of the events is determined based on a comparison of the two times.

    Hyperparameter tuning using visual analytics in a data science platform

    公开(公告)号:US12248888B2

    公开(公告)日:2025-03-11

    申请号:US16138684

    申请日:2018-09-21

    Applicant: Cloudera, Inc.

    Abstract: Techniques are disclosed for facilitating the tuning of hyperparameter values during the development of machine learning (ML) models using visual analytics in a data science platform. In an example embodiment, a computer-implemented data science platform is configured to generate, and display to a user, interactive visualizations that dynamically change in response to user interaction. Using the introduced technique, a user can, for example, 1) tune hyperparameters through an iterative process using visual analytics to gain and use insights into how certain hyperparameters affect model performance and convergence, 2) leverage automation and recommendations along this process to optimize the tuning given available resources, 3) collaborate with peers, and 4) view costs associated with executing experiments during the tuning process.

    Utilization-aware resource scheduling in a distributed computing cluster

    公开(公告)号:US12223349B2

    公开(公告)日:2025-02-11

    申请号:US17379742

    申请日:2021-07-19

    Applicant: Cloudera, Inc.

    Inventor: Karthik Kambatla

    Abstract: Embodiments are disclosed for a utilization-aware approach to cluster scheduling, to address this resource fragmentation and to improve cluster utilization and job throughput. In some embodiments a resource manager at a master node considers actual usage of running tasks and schedules opportunistic work on underutilized worker nodes. The resource manager monitors resource usage on these nodes and preempts opportunistic containers in the event this over-subscription becomes untenable. In doing so, the resource manager effectively utilizes wasted resources, while minimizing adverse effects on regularly scheduled tasks.

    SNAPSHOT COMPARISON WITH METADATA COMPACTION
    45.
    发明公开

    公开(公告)号:US20230385157A1

    公开(公告)日:2023-11-30

    申请号:US18325853

    申请日:2023-05-30

    Applicant: Cloudera, Inc.

    Abstract: Snapshot or point-in-time image functionality improves the use of object-based datastores. An example system includes an object-based datastore and a metadata datastore associated with the object-based datastore. Instances of the metadata datastore are created as snapshot images of the object-based datastore. Comparison of snapshot images is important for database analytics, disaster recovery, data protection, and more. Example techniques provide comparison of snapshot images (as metadata datastore instances) and remain robust and accurate in view of compactions performed by the metadata datastore. An example technique includes generating and updating a graph-based data structure that captures relationships between metadata files in the metadata datastore, particularly between pre-compaction files and post-compaction files. The example technique further includes referencing the graph-based data structure to accelerate snapshot image comparison based on determining whether files of a source snapshot image were compacted into files of a destination snapshot image, and/or vice versa.

    Manifest-based snapshots in distributed computing environments

    公开(公告)号:US11768739B2

    公开(公告)日:2023-09-26

    申请号:US16943674

    申请日:2020-07-30

    Applicant: Cloudera, Inc.

    CPC classification number: G06F11/1464 G06F16/27 G06F11/1456 G06F2201/84

    Abstract: Scalable architectures, systems, and services are provided herein for creating manifest-based snapshots in distributed computing environments. In some embodiments, responsive to receiving a request to create a snapshot of a data object, a master node identifies multiple slave nodes on which a data object is stored in the cloud-computing platform and creates a snapshot manifest representing the snapshot of the data object. The snapshot manifest comprises a file including a listing of multiple file names in the snapshot manifest and reference information for locating the multiple files in the distributed database system. The snapshot can be created without disrupting I/O operations, e.g., in an online mode by various region servers as directed by the master node. Additionally, a log roll approach to creating the snapshot is also disclosed in which log files are marked. The replaying of log entries can reduce the probability of causal consistency in the snapshot.

    ENSURING PROPERLY ORDERED EVENTS IN A DISTRIBUTED COMPUTING ENVIRONMENT

    公开(公告)号:US20220382323A1

    公开(公告)日:2022-12-01

    申请号:US17836909

    申请日:2022-06-09

    Applicant: Cloudera, Inc.

    Abstract: A first event occurs at a first computer at a first time, as measured by a local clock. A second event is initiated at a second computer by sending a message that includes the first time. The second event occurs at a second time, as measured by a local clock. Because of clock error, the first time is later than the second time. Based on the first time being later than the second time, an alternate second time, that is based on the first time, is used as the time of the second event. When a third system determines the order of the two events, the first time is obtained from the first computer, and the alternate second time is obtained from the second computer, and the order of the events is determined based on a comparison of the two times.

    MUTATIONS IN A COLUMN STORE
    49.
    发明申请

    公开(公告)号:US20210271653A1

    公开(公告)日:2021-09-02

    申请号:US17314813

    申请日:2021-05-07

    Applicant: Cloudera, Inc.

    Inventor: Todd Lipcon

    Abstract: Columnar storage provides many performance and space saving benefits for analytic workloads, but previous mechanisms for handling single row update transactions in column stores suffer from poor performance. A columnar data layout facilitates both low-latency random access capabilities together with high-throughput analytical access capabilities, simplifying Hadoop architectures for use cases involving real-time data. In disclosed embodiments, mutations within a single row are executed atomically across columns and do not necessarily include the entirety of a row. This allows for faster updates without the overhead of reading or rewriting larger columns.

    Utilization-aware resource scheduling in a distributed computing cluster

    公开(公告)号:US11099892B2

    公开(公告)日:2021-08-24

    申请号:US16797996

    申请日:2020-02-21

    Applicant: Cloudera, Inc.

    Inventor: Karthik Kambatla

    Abstract: Embodiments are disclosed for a utilization-aware approach to cluster scheduling, to address this resource fragmentation and to improve cluster utilization and job throughput. In some embodiments a resource manager at a master node considers actual usage of running tasks and schedules opportunistic work on underutilized worker nodes. The resource manager monitors resource usage on these nodes and preempts opportunistic containers in the event this over-subscription becomes untenable. In doing so, the resource manager effectively utilizes wasted resources, while minimizing adverse effects on regularly scheduled tasks.

Patent Agency Ranking