Abstract:
The technique introduced here includes a system and method for identifying and mapping duplicate data objects referenced by data objects. The technique illustratively utilizes a hierarchical tree of fingerprints for each data object to compare the data objects and identify duplicate data blocks referenced by the data objects. A progressive comparison of the hierarchical trees starts from a top layer of the hierarchical trees and proceeds toward a base layer. Between the compared data objects (i.e., the compared hierarchical trees), the technique maps matching fingerprints only at the top-most layer of the hierarchical trees at which the fingerprints match. Lower layer matching fingerprints are neither compared nor mapped. Data blocks corresponding to the matching fingerprints are then deleted. Such an identification and mapping technique substantially reduces the amount of mapping metadata stored in data objects that have been subject to deduplication.
Abstract:
Described herein are systems and methods for providing data policy management over application objects in a storage system environment. An application object may comprise non-virtual or virtual objects (e.g., non-virtual-based applications, virtual-based applications, or virtual storage components). An application object manager may represent application objects by producing mapping graphs and/or application object data that represent application objects in a standardized manner. A mapping graph for an application object may describe a mapping between the application object and its underlying storage objects on a storage system. Application object data may describe a mapping graph in a standardized format. Application object data representing application objects may be received by an application policy manager that manages data policies on the application objects (including virtual applications and virtual storage components) based on the received application object data. Data policies may include policies for backup, service level objectives, recovery, monitoring and/or reporting.
Abstract:
A novel RDMA connection failover technique that minimizes disruption to upper subsystem modules (executed on a computer node), which create requests for data transfer. A new failover virtual layer performs failover of an RDMA connection in error so that the upper subsystem that created a request does not have knowledge of an error (which is recoverable in software and hardware), or of a failure on the RDMA connection due to the error. Since the upper subsystem does not have knowledge of a failure on the RDMA connection or of a performed failover of the RDMA connection, the upper subsystem continues providing requests to the failover virtual layer without interruption, thereby minimizing downtime of the data transfer activity.
Abstract:
A storage server is coupled to a storage device that stores blocks of data, and generates a fingerprint for each data block stored on the storage device. The storage server creates a fingerprints datastore that is divided into a primary datastore and a secondary datastore. The primary datastore comprises a single entry for each unique fingerprint and the secondary datastore comprises an entry having an identical fingerprint as an entry in the primary datastore. The storage server merges entries in a changelog with the entries in the primary datastore to identify duplicate data blocks in the storage device and frees the identified duplicate data blocks in the storage device. The storage server stores the entries that correspond to the freed data blocks to a third datastore and overwrites the primary datastore with the entries from the merged data that correspond to the unique fingerprints to create an updated primary datastore.
Abstract:
One or more techniques and/or systems are disclosed for enabling communication between a SAS communication port of a SAS communication component and multiple storage devices. In a first example, a first SAS to SATA bridge chip and a second SAS to SATA bridge chip may be configured to route data from a SAS communication component to multiple storage devices. In a second example, a SAS to SATA bridge chip and a port multiplier may be configured to route data from a SAS communication component to multiple storage devices. In a third example, a four port SAS to SATA bridge comprising two SAS ports and two SATA ports may be configured to route data from a SAS communication component to multiple storage devices. Supporting two or more storage devices with a single SAS communication port allows storage enclosures to increase storage capacity, while decreasing cost per slot.
Abstract:
The present invention is directed to an architecture for promoting improved cloud computing. The architecture includes a plurality of diskless server nodes. The architecture further includes a plurality of Serial Attached Small Computer System Interface (SAS) switches, the plurality of SAS switches being connected to the plurality of diskless server nodes. The architecture further includes a storage system, the storage system configured for being communicatively coupled to the plurality of servers via the plurality of SAS switches. Further, the storage system is configured for implementing Controlled Replication Under Scalable Hashing (CRUSH) redundancy. Still further, the architecture is configured for dynamically mapping data stores of the storage system to the diskless server nodes.
Abstract:
At least certain embodiments disclose a method, system and apparatus for relocating data between tiers of storage media in a hybrid storage aggregate encompassing multiple tiers of heterogeneous physical storage media including a file system to automatically relocate the data between tiers. The hybrid storage aggregate includes one or more volumes, each volume including a volume block number space spanning at least a first-tier of storage media and a second tier of storage media of the multiple tiers of heterogeneous physical storage media and the hybrid storage aggregate further includes a control module to cooperatively manage the tiers of the multiple tiers of heterogeneous physical storage media and a file system coupled with the control module, the file system including a policy module configured to make policy decisions based on a set of one or more policies and configured to automatically relocate data between different tiers of the multiple tiers of heterogeneous physical storage media based on the set of policies.
Abstract:
A method implements a hierarchical cache system with a parallel Network File System (pNFS) configuration for a storage system. Upon receiving a request by the hierarchical cache system to access data stored in the storage system, the method divides the data into a plurality of data segments and distributes the plurality of data segments to a plurality of cache servers of the cache system. The method responds to the request a metadata layout for the plurality of data segments distributed among the plurality of cache servers. Based on the metadata layout, the plurality of data segments can be concurrently retrieved from the plurality of cache servers.
Abstract:
A network storage server implements a method to limit simultaneous data transfers and efficient throttle management. The number of processes that can be simultaneously performed in the network storage server is limited. For the processes that do not exceed the limiting number, and are therefore allowed to be simultaneously performed, a throttle control is implemented on each of the processes to limit the amount of system resources that can be allocated to each of the processes. The processes are performed on the network storage server, and a total amount of system resources allocated to these processes does not exceed the available system resources of the network storage server.
Abstract:
A technique for organizing data to facilitate data deduplication includes dividing a block-based set of data into multiple "chunks", where the chunk boundaries are independent of the block boundaries (due to the hashing algorithm). Metadata of the data set, such as block pointers for locating the data, are stored in a tree structure that includes multiple levels, each of which includes at least one node. The lowest level of the tree includes multiple nodes that each contain chunk metadata relating to the chunks of the data set. In each node of the lowest level of the buffer tree, the chunk metadata contained therein identifies at least one of the chunks. The chunks (user-level data) are stored in one or more system files that are separate from the buffer tree and not visible to the user.