Abstract:
In one embodiment, a layered file system of a storage input/output (I/O) stack executes on one or more nodes of a cluster. The layered file system includes a flash-optimized, log-structured layer configured to provide sequential storage of data and metadata (i.e., a log-structured layout) on solid state drives (SSDs) of storage arrays in the cluster to reduce write amplification, while leveraging a data de-duplication feature of the storage I/O stack. An extent store layer of the file system performs and maintains mappings of the extent keys to SSD storage locations, while a volume layer of the file system performs and maintains mappings of the LUN offset ranges to the extent keys. Separation of the mapping functions between the volume and extent store layers enables different volumes with different offset ranges to reference a same extent key (and thus a same extent).
Abstract:
Non-volatile random access memory (NVRAM) caching and logging may be configured to deliver low latency acknowledgements of input/output (I/O) requests, such as write requests, while avoiding loss of data associated with the requests that may occur as a result of power failures. Write data associated with one or more write requests may be received at a node of a cluster. The write data may be stored in a portion of an NVRAM configured as, e.g., a persistent write-back cache of the node, while parameters of the request may be stored in another portion of the NVRAM configured as one or more logs, e.g., NVLogs. The write data may be organized into separate variable length blocks or extents and "written back" out-of-order from the write back cache to storage devices, such as solid state drives (SSDs). The write data may be preserved in the write-back cache until each extent is safely and successfully stored on SSD (i.e., in the event of power loss), or operations associated with the write request are sufficiently logged on NVLog, to thereby provide efficient recovery when attempting to restore the write data preserved in the cache to the SSDs.
Abstract:
In one embodiment, a flash-optimized, log-structured layer of a file system of a storage input/output (I/O) stack executes on one or more nodes of a cluster. The log-structured layer of the file system provides sequential storage of data and metadata on solid state drives (SSDs) to reduce write amplification, while leveraging variable compression and variable length data features of the storage I/O stack. The data may be organized as an arbitrary number of variable-length extents of one or more host-visible logical units (LUNs). The metadata may include mappings from host-visible logical block address ranges of a LUN to extent keys, as well as mappings of the extent keys to SSD storage locations of the extents. The storage location of an extent on SSD is effectively "virtualized" by its mapped extent key such that relocation of the extent on SSD does not require update to volume layer metadata.
Abstract:
The embodiments described herein are directed to the use of hashing in a file system metadata arrangement that reduces an amount of metadata stored in a memory of a node in a cluster and that reduces the amount of metadata needed to process an input/output (I/O) request at the node. Illustratively, the embodiments are directed to cuckoo hashing and, in particular, to a manner in which cuckoo hashing may be modified and applied to construct the file system metadata arrangement. In an embodiment, the file system metadata arrangement may be illustratively configured as a key-value extent store embodied as a data structure, e.g., a cuckoo hash table, wherein a value, such as a hash table index, may be configured as an index and applied to the cuckoo hash table to obtain a key, such as an extent key, configured to reference a location of an extent on one or more storage devices, such as solid state drives.
Abstract:
In one embodiment, a technique is provided for distributing data and associated metadata within a distributed storage architecture. A set of hash tables that embody mappings of cluster-wide identifiers associated with storage locations are stored for write data of write requests organized into extents. A hash value is generated from a hash function applied to each extent. The hash value is overloaded and used for multiple purposes within the distributed storage architecture, including (i) a remainder computation on the hash value to select a bucket of a plurality of buckets representative of the extents, (ii) a hash table selector of the hash value to select a hash table from the set of hash tables, and (iii) a hash table index computed from the hash value to select an entry from a plurality of entries of the selected hash table having a cluster-wide identifier identifying a storage location for the extent.
Abstract:
In one embodiment, an extent key reconstruction technique is provided for use with a set of hash tables embodying metadata. The metadata includes an extent key associated with a storage location on storage devices for write data of one or more write requests organized into an extent. Each hash table has a plurality of entries, and each entry includes a plurality of slots. A first field of the extent key is recreated implicitly from an entry in a first address space portion of a hash table. A second field of the extent key is stored in the slot. A third field of the extent key is stored in the slot. A fourth field of the extent key is recreated implicitly from the hash table of the set of hash tables.
Abstract:
In one embodiment, an extent key reconstruction technique is provided for use with a set of hash tables embodying metadata. The metadata includes an extent key associated with a storage location on storage devices for write data of one or more write requests organized into an extent. Each hash table has a plurality of entries, and each entry includes a plurality of slots. A first field of the extent key is recreated implicitly from an entry in a first address space portion of a hash table. A second field of the extent key is stored in the slot. A third field of the extent key is stored in the slot. A fourth field of the extent key is recreated implicitly from the hash table of the set of hash tables.
Abstract:
Data consistency and availability can be provided at the granularity of logical storage objects in storage solutions that use storage virtualization in clustered storage environments. To ensure consistency of data across different storage elements, synchronization is performed across the different storage elements. Changes to data are synchronized across storage elements in different clusters by propagating the changes from a primary logical storage object to a secondary logical storage object. To satisfy the strictest RPOs while maintaining performance, change requests are intercepted prior to being sent to a filesystem that hosts the primary logical storage object and propagated to a different managing storage element associated with the secondary logical storage object.
Abstract:
In one embodiment, a technique is provided for distributing data and associated metadata within a distributed storage architecture. A set of hash tables that embody mappings of cluster-wide identifiers associated with storage locations are stored for write data of write requests organized into extents. A hash value is generated from a hash function applied to each extent. The hash value is overloaded and used for multiple purposes within the distributed storage architecture, including (i) a remainder computation on the hash value to select a bucket of a plurality of buckets representative of the extents, (ii) a hash table selector of the hash value to select a hash table from the set of hash tables, and (iii) a hash table index computed from the hash value to select an entry from a plurality of entries of the selected hash table having a cluster-wide identifier identifying a storage location for the extent.