Accumulating and flushing mutations in a column store

    公开(公告)号:US12222915B2

    公开(公告)日:2025-02-11

    申请号:US17314813

    申请日:2021-05-07

    Applicant: Cloudera, Inc.

    Inventor: Todd Lipcon

    Abstract: Columnar storage provides many performance and space saving benefits for analytic workloads, but previous mechanisms for handling single row update transactions in column stores suffer from poor performance. A columnar data layout facilitates both low-latency random access capabilities together with high-throughput analytical access capabilities, simplifying Hadoop architectures for use cases involving real-time data. In disclosed embodiments, mutations within a single row are executed atomically across columns and do not necessarily include the entirety of a row. This allows for faster updates without the overhead of reading or rewriting larger columns.

    COMPACTION POLICY
    3.
    发明申请
    COMPACTION POLICY 审中-公开

    公开(公告)号:US20190278783A1

    公开(公告)日:2019-09-12

    申请号:US16424083

    申请日:2019-05-28

    Applicant: Cloudera, Inc.

    Inventor: Todd Lipcon

    Abstract: A compaction policy imposing soft limits to optimize system efficiency is used to select various rowsets on which to perform compaction, each rowset storing keys within an interval called a keyspace. For example, the disclosed compaction policy results in a decrease in a height of the tablet, removes overlapping rowsets, and creates smaller sized rowsets. The compaction policy is based on the linear relationship shared between the keyspace height and the cost associated with performing an operation (e.g., an insert operation) in that keyspace. Accordingly, various factors determining which rowsets are to be compacted, how large the compacted rowsets are to be made, and when to perform the compaction, are considered within the disclosed compaction policy. Furthermore, a system and method for performing compaction on the selected datasets in a log-structured database is also provided.

    Ensuring properly ordered events in a distributed computing environment

    公开(公告)号:US10171635B2

    公开(公告)日:2019-01-01

    申请号:US14462445

    申请日:2014-08-18

    Applicant: Cloudera, Inc.

    Abstract: A first event occurs at a first computer at a first time, as measured by a local clock. A second event is initiated at a second computer by sending a message that includes the first time. The second event occurs at a second time, as measured by a local clock. Because of clock error, the first time is later than the second time. Based on the first time being later than the second time, an alternate second time, that is based on the first time, is used as the time of the second event. When a third system determines the order of the two events, the first time is obtained from the first computer, and the alternate second time is obtained from the second computer, and the order of the events is determined based on a comparison of the two times.

    Memory allocation buffer for reduction of heap fragmentation
    5.
    发明授权
    Memory allocation buffer for reduction of heap fragmentation 有权
    用于减少堆碎片的内存分配缓冲区

    公开(公告)号:US09128949B2

    公开(公告)日:2015-09-08

    申请号:US13745461

    申请日:2013-01-18

    Applicant: Cloudera, Inc.

    Inventor: Todd Lipcon

    Abstract: Systems and methods of a memory allocation buffer to reduce heap fragmentation. In one embodiment, the memory allocation buffer structures a memory arena dedicated to a target region that is one of a plurality of regions in a server in a database cluster such as an HBase cluster. The memory area has a chunk size (e.g., 2 MB) and an offset pointer. Data objects in write requests targeted to the region are received and inserted to the memory arena at a location specified by the offset pointer. When the memory arena is filled, a new one is allocated. When a MemStore of the target region is flushed, the entire memory arenas for the target region are freed up. This reduces heap fragmentation that is responsible for long and/or frequent garbage collection pauses.

    Abstract translation: 内存分配缓冲区的系统和方法,以减少堆碎片。 在一个实施例中,存储器分配缓冲器构造专用于数据库集群(例如HBase集群)中的服务器中的多个区域之一的目标区域的存储器竞技场。 存储器区域具有块大小(例如,2MB)和偏移指针。 接收到针对该区域的写请求中的数据对象,并将其插入到由偏移指针指定的位置的存储器场。 当记忆体被填满时,会分配一个新的记忆体。 当目标区域的MemStore被刷新时,目标区域的整个内存区域被释放。 这减少了堆碎片,这些碎片负责长时间和/或频繁的垃圾回收暂停。

    Ensuring properly ordered events in a distributed computing environment

    公开(公告)号:US12255978B2

    公开(公告)日:2025-03-18

    申请号:US18357021

    申请日:2023-07-21

    Applicant: Cloudera, Inc.

    Abstract: A first event occurs at a first computer at a first time, as measured by a local clock. A second event is initiated at a second computer by sending a message that includes the first time. The second event occurs at a second time, as measured by a local clock. Because of clock error, the first time is later than the second time. Based on the first time being later than the second time, an alternate second time, that is based on the first time, is used as the time of the second event. When a third system determines the order of the two events, the first time is obtained from the first computer, and the alternate second time is obtained from the second computer, and the order of the events is determined based on a comparison of the two times.

    ENSURING PROPERLY ORDERED EVENTS IN A DISTRIBUTED COMPUTING ENVIRONMENT

    公开(公告)号:US20220382323A1

    公开(公告)日:2022-12-01

    申请号:US17836909

    申请日:2022-06-09

    Applicant: Cloudera, Inc.

    Abstract: A first event occurs at a first computer at a first time, as measured by a local clock. A second event is initiated at a second computer by sending a message that includes the first time. The second event occurs at a second time, as measured by a local clock. Because of clock error, the first time is later than the second time. Based on the first time being later than the second time, an alternate second time, that is based on the first time, is used as the time of the second event. When a third system determines the order of the two events, the first time is obtained from the first computer, and the alternate second time is obtained from the second computer, and the order of the events is determined based on a comparison of the two times.

    MUTATIONS IN A COLUMN STORE
    8.
    发明申请

    公开(公告)号:US20210271653A1

    公开(公告)日:2021-09-02

    申请号:US17314813

    申请日:2021-05-07

    Applicant: Cloudera, Inc.

    Inventor: Todd Lipcon

    Abstract: Columnar storage provides many performance and space saving benefits for analytic workloads, but previous mechanisms for handling single row update transactions in column stores suffer from poor performance. A columnar data layout facilitates both low-latency random access capabilities together with high-throughput analytical access capabilities, simplifying Hadoop architectures for use cases involving real-time data. In disclosed embodiments, mutations within a single row are executed atomically across columns and do not necessarily include the entirety of a row. This allows for faster updates without the overhead of reading or rewriting larger columns.

    Ensuring properly ordered events in a distributed computing environment

    公开(公告)号:US10681190B2

    公开(公告)日:2020-06-09

    申请号:US16198677

    申请日:2018-11-21

    Applicant: Cloudera, Inc.

    Abstract: A first event occurs at a first computer at a first time, as measured by a local clock. A second event is initiated at a second computer by sending a message that includes the first time. The second event occurs at a second time, as measured by a local clock. Because of clock error, the first time is later than the second time. Based on the first time being later than the second time, an alternate second time, that is based on the first time, is used as the time of the second event. When a third system determines the order of the two events, the first time is obtained from the first computer, and the alternate second time is obtained from the second computer, and the order of the events is determined based on a comparison of the two times.

Patent Agency Ranking