Abstract:
The embodiments described herein are directed to efficient merging of metadata managed by a volume layer of a storage input/output (I/O) stack executing on one or more nodes of a cluster. The metadata managed by the volume layer, i.e., the volume metadata, is illustratively organized as a multi-level dense tree metadata structure, wherein each level of the dense tree metadata structure (dense tree) includes volume metadata entries for storing the volume metadata. The volume metadata entries of an upper level of the dense tree metadata structure are merged with the volume metadata entries of a next lower level of the dense tree metadata structure when the upper level is full. The volume metadata entries of the merged levels are organized as metadata pages and stored as one or more files on the SSDs.
Abstract:
In one embodiment, a node coupled to one or more storage devices executes a storage input/output (I/O) stack having a volume layer that manages volume metadata. The volume metadata is organized as one or more dense tree metadata structures having a top level residing in memory and lower levels residing on the one or more storage devices. The dense tree metadata structures include a first dense tree metadata structure associated with a parent volume and a second dense tree metadata structure associated with a copy of the parent volume. The top level of the first dense tree metadata structure may be copied to the second dense tree metadata structure. The lower levels of the first dense tree metadata structure are initially shared with the second dense tree metadata structure. The shared lower levels may eventually be split as the parent volume diverges from the copy of the parent volume.
Abstract:
In one embodiment, snapshots and/or clones of storage objects are created and managed by a volume layer of a storage input/output (I/O) stack executing on one or more nodes of a cluster. Illustratively, the snapshots and clones may be represented as independent volumes, and embodied as respective read-only copies (snapshots) and read-write copies (clones) of a parent volume. Volume metadata is illustratively organized as one or more multi-level dense tree metadata structures, wherein each level of the dense tree metadata structure (dense tree) includes volume metadata entries for storing the metadata. Each snapshot/clone may be derived from a dense tree of the parent volume (parent dense tree). Portions of the parent dense tree may be shared with the snapshot/clone.
Abstract:
The embodiments described herein are directed to efficient logging and checkpointing of metadata managed by a volume layer of a storage input/output (I/O) stack executing on one or more nodes of a cluster. The metadata managed by the volume layer, i.e., the volume metadata, is illustratively organized as a multi-level dense tree metadata structure, wherein each level of the dense tree metadata structure (dense tree) includes volume metadata entries for storing the volume metadata. Each volume metadata entry may be a descriptor that embodies one of a plurality of types, including a data entry and an index entry, and a hole (i.e., absence of data) entry.
Abstract:
In one embodiment, an extent hashing technique is used to efficiently distribute data and associated metadata substantially evenly among nodes of a cluster. The data may be write data associated with a write request issued by a host and received at a node of the cluster. The write data may be organized into one or more extents. A hash function may be applied to the extent to generate a result which may be truncated or trimmed to generate a hash value. A hash space of the hash value may be divided into a plurality of buckets representative of the write data, i.e., the extents, and the associated metadata, i.e., extent metadata. A number of buckets may be assigned to each extent store instance of the nodes to distribute ownership of the buckets, along with their extents and extent metadata, across all of the extent store instances of the nodes.
Abstract:
In one embodiment, a node coupled to one or more storage devices executes a storage input/output (I/O) stack having a volume layer. The volume layer manages volume metadata embodied as mappings from offsets of a logical unit (LUN) to extent keys associated with storage locations for extents on the one or more storage devices. Volume metadata is maintained as a dense tree metadata structure representing successive points in time. The dense tree metadata structure has multiple levels, wherein a top level of the dense tree metadata structure represents newer volume metadata changes and descending levels of the dense tree metadata structure represent older volume metadata changes. The node accesses a latest version of changes to the volume metadata by searching from the top level to the descending levels in the dense tree metadata structure.
Abstract:
A technique preserves efficiency for replication of data between a source node of a source cluster (“source”) and a destination node of a destination cluster (“destination”) of a clustered network. Replication in the clustered network may be effected by leveraging global in-line deduplication at the source to identify and avoid copying duplicate data from the source to the destination. To ensure that the copy of the data on the destination is synchronized with the data received at the source, the source creates a snapshot of the data for use as a baseline copy at the destination. Thereafter, new data received at the source that differs from the baseline snapshot are transmitted and copied to the destination. In addition, the source and destination nodes negotiate to establish a mapping of name-to-data when transferring data (i.e., an extent) between the clusters. Illustratively, the name is an extent key for the extent, such that the negotiated mapping established by the source and destination is based on the extent key associated with the extent.
Abstract:
A first plurality of block identifiers is sorted based, at least in part, on a measure of spatial locality. A second plurality of block identifiers is sorted based, at least in part, on the measure of spatial locality. At least the first plurality of block identifiers and the second plurality of block identifiers are incrementally merged into a third plurality of block identifiers based, at least in part, on the measure of spatial locality. A block of data corresponding to metadata associated with a plurality of block identifiers of the third plurality of block identifiers is updated.
Abstract:
The embodiments described herein are directed to an organization of metadata managed by a volume layer of a storage input/output (I/O) stack executing on one or more nodes of a cluster. The metadata managed by the volume layer, i.e., the volume metadata, is illustratively embodied as mappings from addresses, i.e., logical block addresses (LBAs), of a logical unit (LUN) accessible by a host to durable extent keys maintained by an extent store layer of the storage I/O stack. In an embodiment, the volume layer organizes the volume metadata as a mapping data structure, i.e., a dense tree metadata structure, which represents successive points in time to enable efficient access to the metadata.
Abstract:
In one embodiment, a technique is provided for distributing data and associated metadata within a distributed storage architecture. A set of hash tables that embody mappings of cluster-wide identifiers associated with storage locations are stored for write data of write requests organized into extents. A hash value is generated from a hash function applied to each extent. The hash value is overloaded and used for multiple purposes within the distributed storage architecture, including (i) a remainder computation on the hash value to select a bucket of a plurality of buckets representative of the extents, (ii) a hash table selector of the hash value to select a hash table from the set of hash tables, and (iii) a hash table index computed from the hash value to select an entry from a plurality of entries of the selected hash table having a cluster-wide identifier identifying a storage location for the extent.