Deduplication-Aware Load Balancing in Distributed Storage Systems

    公开(公告)号:US20190026042A1

    公开(公告)日:2019-01-24

    申请号:US15653249

    申请日:2017-07-18

    Applicant: VMware, Inc.

    Abstract: Techniques for enabling deduplication-aware load balancing in a distributed storage system are provided. In one set of embodiments, a node of the distributed storage system can receive an I/O (Input/Output) request pertaining to a data block of a storage object stored on a local storage component of the node. The node can further determine whether the I/O request requires insertion of a new entry into a deduplication hash table associated with the local storage component or deletion of an existing entry from the deduplication hash table. If the I/O request requires insertion of a new hash table entry, the node can add an identifier of the data block into a probabilistic data structure associated with the local storage component, where the probabilistic data structure is configured to maintain information regarding distinct data blocks that are likely present in the local storage component. Alternatively, if the I/O request requires deletion of an existing hash table entry, the node can remove the identifier of the data block from the probabilistic data structure.

    Providing end-to-end checksum within a distributed virtual storage area network module

    公开(公告)号:US10102057B2

    公开(公告)日:2018-10-16

    申请号:US14716756

    申请日:2015-05-19

    Applicant: VMware, Inc.

    Abstract: Exemplary methods, apparatuses, and systems include a first layer of a virtual storage area network (VSAN) module receiving a write request from a data compute node. The write request includes data to be written and the VSAN module is distributed across a plurality of computers to provide an aggregate object store using storage attached to each of the plurality of computers. The first layer of the VSAN module calculates a checksum for the data to be written and passes the data to be written and the checksum to a second layer of the VSAN module. The second layer of the VSAN module calculates a first verification checksum for the data to be written. The data and the checksum are written to persistent storage in response to determining the first verification checksum matches the checksum passed by the first layer of the VSAN module.

    HIGHLY CONCURRENT DATA STORE ALLOCATION

    公开(公告)号:US20220398025A1

    公开(公告)日:2022-12-15

    申请号:US17347491

    申请日:2021-06-14

    Applicant: VMware, Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for executing concurrent writes to a data store. One of the systems includes a data store comprising a plurality of storage segments, wherein each storage segment comprises a plurality of blocks; and an allocator system comprising: a plurality of threads, and a plurality of bitmaps each corresponding to a respective storage segment of the data store, wherein the allocator system is configured to perform operations comprising: assigning a respective bitmap to each thread of the plurality of threads; and executing, by each thread of the plurality of threads, one or more write requests to one or more blocks of the storage segment corresponding to the thread using the bitmap assigned to the thread, wherein executing a write request by a thread includes updating the bitmap assigned to the thread.

    Trading off cache space and write amplification for Bε-trees

    公开(公告)号:US10983909B2

    公开(公告)日:2021-04-20

    申请号:US16252488

    申请日:2019-01-18

    Applicant: VMware, Inc.

    Abstract: Certain aspects provide systems and methods for performing an operation on a Bε-tree. A method comprises writing a message associated with the operation to a first slot in a first buffer of a first non-leaf node of the Bε-tree in an append-only manner, wherein a first filter associated with the first slot is used for query operations associated with the first slot. The method further comprises determining that the first buffer is full and, upon determining to flush the message to a non-leaf child node, flushing the message in an append-only manner to a second slot in a second buffer of the non-leaf child node, wherein a second filter associated with the second slot is used for query operations associated with the second slot. The method further comprises, upon determining to flush the message to a leaf node, flushing the message to the leaf node in a sorted manner.

Patent Agency Ranking