Low latency query engine for apache hadoop

    公开(公告)号:US09990399B2

    公开(公告)日:2018-06-05

    申请号:US15154727

    申请日:2016-05-13

    Applicant: Cloudera, Inc.

    Abstract: A low latency query engine for APACHE HADOOP™ that provides real-time or near real-time, ad hoc query capability, while completing batch-processing of MapReduce. In one embodiment, the low latency query engine comprises a daemon that is installed on data nodes in a HADOOP™ cluster for handling query requests and all internal requests related to query execution. In a further embodiment, the low latency query engine comprises a daemon for providing name service and metadata distribution. The low latency query engine receives a query request via client, turns the request into collections of plan fragments and coordinates parallel and optimized execution of the plan fragments on remote daemons to generate results at a much faster speed than existing batch-oriented processing frameworks.

    Virtual machine image encryption
    34.
    发明授权

    公开(公告)号:US09934382B2

    公开(公告)日:2018-04-03

    申请号:US14526372

    申请日:2014-10-28

    Applicant: Cloudera, Inc.

    Inventor: Eduardo Garcia

    Abstract: Embodiments of the present disclosure include systems and methods for encrypting a virtual machine image and accessing an encrypted virtual machine image. According to some embodiments an encryption module can encrypt a virtual machine image and place an encryption boot loader. The encryption boot loader may be extracted from the encrypted virtual machine image, be transmitted to, and stored at a key storage system. Upon a request to boot an operating system associated with the encrypted virtual machine image, a pre-boot execution environment may communicate with an image service to retrieve the encryption boot loader from the remote key storage system. The virtual machine image may therefore be decrypted suing the encryption boot loader, which may allow booting of the operating system.

    DATABASE WORKLOAD ANALYSIS AND OPTIMIZATION VISUALIZATIONS

    公开(公告)号:US20170132296A1

    公开(公告)日:2017-05-11

    申请号:US15345375

    申请日:2016-11-07

    Applicant: Cloudera, Inc.

    Inventor: Yihua Ding

    CPC classification number: G06F17/30554

    Abstract: Techniques are described for analyzing usage of data stored in a data storage system without accessing the stored data. In some embodiments, workload data indicative of queries executed at the data storage system on stored data is received. This workload data can include query logs generated during execution of the queries. The workload data is processed to identify data elements such as tables, columns, and views associated with the stored data as well as information regarding usage of the identified data elements. Usage can include operations performed on the data elements during execution of the queries. Based on this processing relationships between the identified data elements can be inferred and visualizations generated that convey information regarding usage of the data stored at the data storage system. Visualizations can include, among others, usage heatmap diagrams, join diagrams, column family diagrams, filter diagrams, view lineage diagrams, data flow diagrams, denormalization diagrams, and workload distribution diagrams.

    LOW LATENCY QUERY ENGINE FOR APACHE HADOOP

    公开(公告)号:US20170132283A1

    公开(公告)日:2017-05-11

    申请号:US15154727

    申请日:2016-05-13

    Applicant: Cloudera, Inc.

    Abstract: A low latency query engine for APACHE HADOOP™ that provides real-time or near real-time, ad hoc query capability, while completing batch-processing of MapReduce. In one embodiment, the low latency query engine comprises a daemon that is installed on data nodes in a HADOOP™ cluster for handling query requests and all internal requests related to query execution. In a further embodiment, the low latency query engine comprises a daemon for providing name service and metadata distribution. The low latency query engine receives a query request via client, turns the request into collections of plan fragments and coordinates parallel and optimized execution of the plan fragments on remote daemons to generate results at a much faster speed than existing batch-oriented processing frameworks.

    VIRTUAL MACHINE IMAGE ENCRYPTION
    37.
    发明申请
    VIRTUAL MACHINE IMAGE ENCRYPTION 有权
    虚拟机图像加密

    公开(公告)号:US20160350535A1

    公开(公告)日:2016-12-01

    申请号:US14526372

    申请日:2014-10-28

    Applicant: Cloudera, Inc.

    Inventor: Eduardo Garcia

    Abstract: Embodiments of the present disclosure include systems and methods for encrypting a virtual machine image and accessing an encrypted virtual machine image. According to some embodiments an encryption module can encrypt a virtual machine image and place an encryption boot loader. The encryption boot loader may be extracted from the encrypted virtual machine image, be transmitted to, and stored at a key storage system. Upon a request to boot an operating system associated with the encrypted virtual machine image, a pre-boot execution environment may communicate with an image service to retrieve the encryption boot loader from the remote key storage system. The virtual machine image may therefore be decrypted suing the encryption boot loader, which may allow booting of the operating system.

    Abstract translation: 本公开的实施例包括用于加密虚拟机映像并访问加密的虚拟机映像的系统和方法。 根据一些实施例,加密模块可加密虚拟机映像并放置加密引导加载程序。 可以从加密的虚拟机映像中提取加密引导加载程序,并将其发送到密钥存储系统并存储在密钥存储系统中。 在请求引导与加密的虚拟机映像相关联的操作系统时,预引导执行环境可以与图像服务通信以从远程密钥存储系统检索加密引导加载程序。 因此,可以对虚拟机映像进行解密,起诉加密引导加载程序,这可能允许启动操作系统。

    CONFIGURING A SYSTEM TO COLLECT AND AGGREGATE DATASETS
    38.
    发明申请
    CONFIGURING A SYSTEM TO COLLECT AND AGGREGATE DATASETS 审中-公开
    配置收集和聚集数据的系统

    公开(公告)号:US20160226968A1

    公开(公告)日:2016-08-04

    申请号:US15098198

    申请日:2016-04-13

    Applicant: Cloudera, Inc.

    Abstract: Methods for configuring a system to collect and aggregate datasets are disclosed. One embodiment includes, identifying a data source in the system from where dataset is to be collected, configuring a machine in the system that generates the dataset to be collected, to send the dataset to the data source, identifying an arrival location where the dataset that is collected is to be aggregated or written, and/or configuring an agent node by specifying a source for the agent node as the data source in the system and specifying a sink for the agent node as the arrival location.

    Abstract translation: 公开了用于配置系统以收集和聚合数据集的方法。 一个实施例包括:识别系统中要从其收集数据集的数据源,在系统中配置生成要收集的数据集的机器,将数据集发送到数据源,识别数据集的到达位置, 通过将代理节点的源指定为系统中的数据源并且为代理节点指定宿令作为到达位置,来收集的集合或写入,和/或配置代理节点。

    COLLECTING AND AGGREGATING LOG DATA WITH FAULT TOLERANCE
    39.
    发明申请
    COLLECTING AND AGGREGATING LOG DATA WITH FAULT TOLERANCE 有权
    收集和聚集日志数据与容错

    公开(公告)号:US20150317231A1

    公开(公告)日:2015-11-05

    申请号:US14796812

    申请日:2015-07-10

    Applicant: Cloudera, Inc.

    Abstract: Systems and methods of collecting and aggregating log data with fault tolerance are disclosed. One embodiment includes, one or more devices that generate log data, the one or more machines each associated with an agent node to collect the log data, wherein, the agent node generates a batch comprising multiple messages from the log data and assigns a tag to the batch. In one embodiment, the agent node further computes a checksum for the batch of multiple messages. The system may further include a collector device, the collector device being associated with a collector tier having a collector node to which the agent sends the log data; wherein, the collector determines the checksum for the batch of multiple messages received from the agent node.

    Abstract translation: 公开了收集和聚合具有容错能力的日志数据的系统和方法。 一个实施例包括生成日志数据的一个或多个设备,每个与代理节点相关联的一个或多个机器以收集日志数据,其中,代理节点生成包括来自日志数据的多个消息的批次,并将标签分配给 批次。 在一个实施例中,代理节点还计算多个消息批次的校验和。 所述系统还可以包括收集器设备,所述收集器设备与具有所述代理发送所述日志数据的收集器节点的收集器层相关联; 其中,收集器确定从代理节点接收的多个消息的批次的校验和。

    Memory allocation buffer for reduction of heap fragmentation
    40.
    发明授权
    Memory allocation buffer for reduction of heap fragmentation 有权
    用于减少堆碎片的内存分配缓冲区

    公开(公告)号:US09128949B2

    公开(公告)日:2015-09-08

    申请号:US13745461

    申请日:2013-01-18

    Applicant: Cloudera, Inc.

    Inventor: Todd Lipcon

    Abstract: Systems and methods of a memory allocation buffer to reduce heap fragmentation. In one embodiment, the memory allocation buffer structures a memory arena dedicated to a target region that is one of a plurality of regions in a server in a database cluster such as an HBase cluster. The memory area has a chunk size (e.g., 2 MB) and an offset pointer. Data objects in write requests targeted to the region are received and inserted to the memory arena at a location specified by the offset pointer. When the memory arena is filled, a new one is allocated. When a MemStore of the target region is flushed, the entire memory arenas for the target region are freed up. This reduces heap fragmentation that is responsible for long and/or frequent garbage collection pauses.

    Abstract translation: 内存分配缓冲区的系统和方法,以减少堆碎片。 在一个实施例中,存储器分配缓冲器构造专用于数据库集群(例如HBase集群)中的服务器中的多个区域之一的目标区域的存储器竞技场。 存储器区域具有块大小(例如,2MB)和偏移指针。 接收到针对该区域的写请求中的数据对象,并将其插入到由偏移指针指定的位置的存储器场。 当记忆体被填满时,会分配一个新的记忆体。 当目标区域的MemStore被刷新时,目标区域的整个内存区域被释放。 这减少了堆碎片,这些碎片负责长时间和/或频繁的垃圾回收暂停。

Patent Agency Ranking