Deduplication-Aware Load Balancing in Distributed Storage Systems

    公开(公告)号:US20190026042A1

    公开(公告)日:2019-01-24

    申请号:US15653249

    申请日:2017-07-18

    Applicant: VMware, Inc.

    Abstract: Techniques for enabling deduplication-aware load balancing in a distributed storage system are provided. In one set of embodiments, a node of the distributed storage system can receive an I/O (Input/Output) request pertaining to a data block of a storage object stored on a local storage component of the node. The node can further determine whether the I/O request requires insertion of a new entry into a deduplication hash table associated with the local storage component or deletion of an existing entry from the deduplication hash table. If the I/O request requires insertion of a new hash table entry, the node can add an identifier of the data block into a probabilistic data structure associated with the local storage component, where the probabilistic data structure is configured to maintain information regarding distinct data blocks that are likely present in the local storage component. Alternatively, if the I/O request requires deletion of an existing hash table entry, the node can remove the identifier of the data block from the probabilistic data structure.

    Deduplication-aware load balancing in distributed storage systems

    公开(公告)号:US11461027B2

    公开(公告)日:2022-10-04

    申请号:US15653249

    申请日:2017-07-18

    Applicant: VMware, Inc.

    Abstract: Techniques for enabling deduplication-aware load balancing in a distributed storage system are provided. In one set of embodiments, a node of the distributed storage system can receive an I/O (Input/Output) request pertaining to a data block of a storage object stored on a local storage component of the node. The node can further determine whether the I/O request requires insertion of a new entry into a deduplication hash table associated with the local storage component or deletion of an existing entry from the deduplication hash table. If the I/O request requires insertion of a new hash table entry, the node can add an identifier of the data block into a probabilistic data structure associated with the local storage component, where the probabilistic data structure is configured to maintain information regarding distinct data blocks that are likely present in the local storage component. Alternatively, if the I/O request requires deletion of an existing hash table entry, the node can remove the identifier of the data block from the probabilistic data structure.

    Optimal snapshot deletion
    6.
    发明授权

    公开(公告)号:US11010334B2

    公开(公告)日:2021-05-18

    申请号:US15947072

    申请日:2018-04-06

    Applicant: VMware, Inc.

    Abstract: Embodiments described herein involve improved management of snapshots of a file system. Embodiments include copying a first root node of a first snapshot to a second snapshot, the second snapshot referencing other nodes of the first snapshot. Embodiments further include incrementing reference counts of the other nodes of the first snapshot. Embodiments further include adding a storage address of the first root node to a list. Embodiments further include, each time that a copy on write operation is performed for a node of the other nodes, adding a storage address of the node to the list and decrementing the reference count of the node. Embodiments further include iterating through the list and, for each storage address in the list, decrementing the reference count of the node corresponding to the storage address and, if the reference count of the node reaches zero, freeing storage space at the storage address.

    SNAPSHOTS AND CLONES IN A BLOCK-BASED DATA DEDUPLICATION STORAGE SYSTEM
    9.
    发明申请
    SNAPSHOTS AND CLONES IN A BLOCK-BASED DATA DEDUPLICATION STORAGE SYSTEM 审中-公开
    基于块的数据存储系统中的快照和克隆

    公开(公告)号:US20160350006A1

    公开(公告)日:2016-12-01

    申请号:US14726572

    申请日:2015-05-31

    Applicant: VMware, Inc.

    Abstract: A deduplication storage system with snapshot and clone capability includes storing logical pointer objects and organizing a first set of the logical pointer objects into a hierarchical structure. A second set of the logical pointer objects may be associated with corresponding logical data blocks of a client data object. The second set of the logical pointer objects may point to physical data blocks having deduplicated data that comprise data of the corresponding logical data blocks. Some of the logical pointer objects in the first set may point to the logical pointer objects in the second set, so that the hierarchical structure represents the client data object. A root of the hierarchical structure may be associated with the client data object. A snapshot or clone may be created by making a copy of the root and associating the copied root with the snapshot or clone.

    Abstract translation: 具有快照和克隆功能的重复数据删除存储系统包括存储逻辑指针对象并将第一组逻辑指针对象组织成层次结构。 第二组逻辑指针对象可以与客户机数据对象的相应逻辑数据块相关联。 第二组逻辑指针对象可以指向具有包括相应逻辑数据块的数据的重复数据删除数据的物理数据块。 第一组中的一些逻辑指针对象可以指向第二组中的逻辑指针对象,使得层次结构表示客户端数据对象。 层次结构的根可以与客户端数据对象相关联。 可以通过创建根的副本并将复制的根与快照或克隆相关联来创建快照或克隆。

    System and method for speed up data rebuild in a distributed storage system with local deduplication

    公开(公告)号:US11474724B2

    公开(公告)日:2022-10-18

    申请号:US15880391

    申请日:2018-01-25

    Applicant: VMware, Inc.

    Abstract: A method includes obtaining a plurality of representations corresponding respectively to a plurality of blocks of data stored on a source node. A plurality of data pairs are sent to a destination node, where each data pair includes a logical address associated with a block of data from the plurality of blocks of data and the corresponding representation of the block of data. A determination is made whether the blocks of data associated with the respective logical addresses are duplicates of data stored on the destination node. In accordance with an affirmative determination, a reference to a physical address of the block of data stored on the destination node is stored. In accordance with a negative determination, an indication that the data corresponding to the respective logical address is not a duplicate is stored. The data indicated as not being a duplicate is written to the destination node.

Patent Agency Ranking