Utilization-aware resource scheduling in a distributed computing cluster

    公开(公告)号:US10572306B2

    公开(公告)日:2020-02-25

    申请号:US15595713

    申请日:2017-05-15

    Applicant: Cloudera, Inc.

    Inventor: Karthik Kambatla

    Abstract: Embodiments are disclosed for a utilization-aware approach to cluster scheduling, to address this resource fragmentation and to improve cluster utilization and job throughput. In some embodiments a resource manager at a master node considers actual usage of running tasks and schedules opportunistic work on underutilized worker nodes. The resource manager monitors resource usage on these nodes and preempts opportunistic containers in the event this over-subscription becomes untenable. In doing so, the resource manager effectively utilizes wasted resources, while minimizing adverse effects on regularly scheduled tasks.

    DESIGN-TIME INFORMATION BASED ON RUN-TIME ARTIFACTS IN TRANSIENT CLOUD-BASED DISTRIBUTED COMPUTING CLUSTERS

    公开(公告)号:US20190138654A1

    公开(公告)日:2019-05-09

    申请号:US15943603

    申请日:2018-04-02

    Applicant: Cloudera, Inc.

    Abstract: Transient computing clusters can be temporarily provisioned in cloud-based infrastructure to run data processing tasks. Such tasks may be run by services operating in the clusters that consume and produce data including operational metadata. Techniques are introduced for tracking data lineage across multiple clusters, including transient computing clusters, based on the operational metadata. In some embodiments, operational metadata is extracted from the transient computing clusters and aggregated at a metadata system for analysis. Based on the analysis of the metadata, operations can be summarized at a cluster level even if the transient computing cluster no longer exists. Further relationships between workflows, such as dependencies or redundancies, can be identified and utilized to optimize the provisioning of computing clusters and tasks performed by the computing clusters.

    ENSURING PROPERLY ORDERED EVENTS IN A DISTRIBUTED COMPUTING ENVIRONMENT

    公开(公告)号:US20190109930A1

    公开(公告)日:2019-04-11

    申请号:US16198677

    申请日:2018-11-21

    Applicant: Cloudera, Inc.

    CPC classification number: H04L69/28 G06F1/14 G06Q10/00 G06Q50/26 H04L67/10

    Abstract: A first event occurs at a first computer at a first time, as measured by a local clock. A second event is initiated at a second computer by sending a message that includes the first time. The second event occurs at a second time, as measured by a local clock. Because of clock error, the first time is later than the second time. Based on the first time being later than the second time, an alternate second time, that is based on the first time, is used as the time of the second event. When a third system determines the order of the two events, the first time is obtained from the first computer, and the alternate second time is obtained from the second computer, and the order of the events is determined based on a comparison of the two times.

    Configuring a system to collect and aggregate datasets

    公开(公告)号:US10187461B2

    公开(公告)日:2019-01-22

    申请号:US15098198

    申请日:2016-04-13

    Applicant: Cloudera, Inc.

    Abstract: Methods for configuring a system to collect and aggregate datasets are disclosed. One embodiment includes, identifying a data source in the system from where dataset is to be collected, configuring a machine in the system that generates the dataset to be collected, to send the dataset to the data source, identifying an arrival location where the dataset that is collected is to be aggregated or written, and/or configuring an agent node by specifying a source for the agent node as the data source in the system and specifying a sink for the agent node as the arrival location.

    Memory allocation buffer for reduction of heap fragmentation
    17.
    发明授权
    Memory allocation buffer for reduction of heap fragmentation 有权
    用于减少堆碎片的内存分配缓冲区

    公开(公告)号:US09552165B2

    公开(公告)日:2017-01-24

    申请号:US14846413

    申请日:2015-09-04

    Applicant: Cloudera, Inc.

    Inventor: Todd Lipcon

    Abstract: Systems and methods of a memory allocation buffer to reduce heap fragmentation. In one embodiment, the memory allocation buffer structures a memory arena dedicated to a target region that is one of a plurality of regions in a server in a database cluster such as an HBase cluster. The memory area has a chunk size (e.g., 2 MB) and an offset pointer. Data objects in write requests targeted to the region are received and inserted to the memory arena at a location specified by the offset pointer. When the memory arena is filled, a new one is allocated. When a MemStore of the target region is flushed, the entire memory arenas for the target region are freed up. This reduces heap fragmentation that is responsible for long and/or frequent garbage collection pauses.

    Abstract translation: 内存分配缓冲区的系统和方法,以减少堆碎片。 在一个实施例中,存储器分配缓冲器构造专用于数据库集群(例如HBase集群)中的服务器中的多个区域之一的目标区域的存储器竞技场。 存储器区域具有块大小(例如,2MB)和偏移指针。 接收到针对该区域的写请求中的数据对象,并将其插入到由偏移指针指定的位置的存储器场。 当记忆体被填满时,会分配一个新的记忆体。 当目标区域的MemStore被刷新时,目标区域的整个内存区域被释放。 这减少了堆碎片,这些碎片负责长时间和/或频繁的垃圾回收暂停。

    Background format optimization for enhanced SQL-like queries in Hadoop
    18.
    发明授权
    Background format optimization for enhanced SQL-like queries in Hadoop 有权
    Hadoop中增强型SQL查询的背景格式优化

    公开(公告)号:US09477731B2

    公开(公告)日:2016-10-25

    申请号:US14043753

    申请日:2013-10-01

    Applicant: Cloudera, Inc.

    Abstract: A format conversion engine for Apache Hadoop that converts data from its original format to a database-like format at certain time points for use by a low latency (LL) query engine. The format conversion engine comprises a daemon that is installed on each data node in a Hadoop cluster. The daemon comprises a scheduler and a converter. The scheduler determines when to perform the format conversion and notifies the converter when the time comes. The converter converts data on the data node from its original format to a database-like format for use by the low latency (LL) query engine.

    Abstract translation: 用于Apache Hadoop的格式转换引擎,可在某些时间点将数据从原始格式转换为数据库格式,以供低延迟(LL)查询引擎使用。 格式转换引擎包括安装在Hadoop集群中每个数据节点上的守护程序。 守护进程包括调度器和转换器。 调度程序确定何时执行格式转换,并在时间到来时通知转换器。 转换器将数据节点上的数据从其原始格式转换为数据库状格式供低延迟(LL)查询引擎使用。

    MEMORY ALLOCATION BUFFER FOR REDUCTION OF HEAP FRAGMENTATION
    19.
    发明申请
    MEMORY ALLOCATION BUFFER FOR REDUCTION OF HEAP FRAGMENTATION 有权
    记忆分配缓冲区用于减少分组分段

    公开(公告)号:US20150378618A1

    公开(公告)日:2015-12-31

    申请号:US14846413

    申请日:2015-09-04

    Applicant: Cloudera, Inc.

    Inventor: Todd Lipcon

    Abstract: Systems and methods of a memory allocation buffer to reduce heap fragmentation. In one embodiment, the memory allocation buffer structures a memory arena dedicated to a target region that is one of a plurality of regions in a server in a database cluster such as an HBase cluster. The memory area has a chunk size (e.g., 2 MB) and an offset pointer. Data objects in write requests targeted to the region are received and inserted to the memory arena at a location specified by the offset pointer. When the memory arena is filled, a new one is allocated. When a MemStore of the target region is flushed, the entire memory arenas for the target region are freed up. This reduces heap fragmentation that is responsible for long and/or frequent garbage collection pauses.

    Abstract translation: 内存分配缓冲区的系统和方法,以减少堆碎片。 在一个实施例中,存储器分配缓冲器构造专用于数据库集群(例如HBase集群)中的服务器中的多个区域之一的目标区域的存储器竞技场。 存储器区域具有块大小(例如,2MB)和偏移指针。 接收到针对该区域的写请求中的数据对象,并将其插入到由偏移指针指定的位置的存储器场。 当记忆体被填满时,会分配一个新的记忆体。 当目标区域的MemStore被刷新时,目标区域的整个内存区域被释放。 这减少了堆碎片,这些碎片负责长时间和/或频繁的垃圾回收暂停。

Patent Agency Ranking