Abstract:
Method and system for access based directory enumeration is provided. When a directory is enumerated for a first time, user credentials are verified against an access control list (ACL) entry that is referenced by an ACL inode (referred to as Xnode). The Xnode number is obtained from a file handle for a directory entry. The verification is recorded in a data structure that stores the Xnode identifier and user identifier. When the directory is enumerated again, the data structure is used to verify that the user has been validated before, instead of loading and checking against an ACL entry.
Abstract:
A dynamic caching technique adaptively controls copies of data blocks stored within caches ("cached copies") of a caching layer distributed among servers of a distributed data processing system. A cache coordinator of the distributed system implements the dynamic caching technique to increase the cached copies of the data blocks to improve processing performance of the servers. Alternatively, the technique may decrease the cached copies to reduce storage capacity of the servers. The technique may increase the cached copies when it detects local and/or remote cache bottleneck conditions at the servers, a data popularity condition at the servers, or a shared storage bottleneck condition at the storage system. Otherwise, the technique may decrease the cached copies at the servers.
Abstract:
Methods and apparatuses for performing inter-protocol copy offload operations are provided. In one embodiment, a method includes receiving a request in a first interface protocol from a host device. The request is a request to copy a data set from a source data storage location to a destination data storage location. The request includes a token, representing the data set, to be copied that was created using a second interface protocol that is different from the first interface protocol. The method also includes transferring the data set, in response to receiving the request, from the source data storage location to the destination data storage location without transferring the data set to the host device.
Abstract:
Methods and apparatuses for efficiently migrating deduplicated data are provided. In one example, a data management system includes a data storage volume, a memory including machine executable instructions, and a computer processor. The data storage volume includes data objects and free storage space. The computer processor executes the instructions to perform deduplication of the data objects and determine migration efficiency metrics for groups of the data objects. Determining the migration efficiency metrics includes determining, for each group, a relationship between the free storage space that will result if the group is migrated from the volume and the resources required to migrate the group from the volume.
Abstract:
Methods and apparatuses for performing selective deduplication in a storage system are introduced here. Techniques are provided for determining a probability of deduplication for a data object based on a characteristic of the data object and performing a deduplication operation on the data object in the storage system prior to the data object being stored in persistent storage of the storage system if the probability of deduplication for the data object has a specified relationship to a specified threshold.
Abstract:
The techniques introduced herein provide for systems and methods for creating and managing a contention-free multi-path access to a distributed data set in a distributed processing system. In one embodiment, a distributed processing system comprises a plurality of compute nodes. The compute nodes are assembled into compute groups and configured such that each compute group has an attached or local storage system. Various data segments of the distributed data set are stored in data storage objects on the local storage system. The data storage objects are cross-mapped into each of the compute nodes in the compute group so that any compute node in the group can access any of the data segments stored in the local storage system via the respective data storage object.
Abstract:
Systems and methods efficiently distribute information, such as path name, attributes and object information, corresponding to changes in a content repository to remote nodes in a network using storage-layer/object-based protocols. A difference monitoring client monitors name space and object space changes by identifying inodes which have been modified on storage volumes between two or more snapshots. The monitoring client builds a list which may include name information, object space information and attributes such as file size and permissions for each of the changed inodes that is utilized to update the edge nodes. Systems and methods also provide for geo-scale content distribution from a central repository to edge nodes using a storage- layer/object protocol. A caching mechanism is utilized to cache requested content at an edge node. Cached content may be maintained at the edge node during use and/or for an additional predetermined period. Difference monitoring client tracks such cached content for later use in the storage system.
Abstract:
Systems and methods that enable the optimal creation of a storage object within a virtual storage system are disclosed. In accordance with embodiments, an optimal location with the storage system is determined in response to receiving an indication that a storage object is to be created within the storage system. The system and method prioritize physical storage resources in which to create the storage object, prioritize components to be provided access to the created storage object, and prioritize the interface between the physical storage resources and the accessing component. The storage object is optimally created within the storage system based on the priorities and based, at least in part, on other created storage objects.
Abstract:
A storage system provides highly flexible data layouts that can be tailored based on reliability considerations. The system allocates reliability values to logical containers at an upper logical level of the system based, for example, on objectives established by reliability SLOs. Based on the reliability value, the system identifies a specific parity group from a lower physical storage level of the system for storing data corresponding to the logical container. After selecting a parity group, the system allocates the data to physical storage blocks within the parity group. In embodiments, the system attaches the reliability value information to the parity group and the physical storage units storing the data. In this manner, the underlying physical layer has a semantic understanding of reliability considerations related to the data stored at the logical level. Based on this semantic understanding, the system has the capability to prioritize data operations on the physical storage units according to the reliability values attached to the parity groups.
Abstract:
The technique introduced here includes a system and method for identification of duplicate data directly at a data-object level. The technique illustratively utilizes a hierarchical tree of fingerprints for each data object to compare data objects and identify duplicate data blocks referenced by the data objects. The hierarchical fingerprint trees are constructed in such a manner that a top-level fingerprint (or object-level fingerprint) of the hierarchical tree is representative of all data blocks referenced by a storage system. In embodiments, inline techniques are utilized to generate hierarchical fingerprints for new data objects as they are created, and an object-level fingerprint of the new data object is compared against preexisting object-level fingerprints in the storage system to identify exact or close matches. While exact matches result in complete deduplication of data blocks referenced by the data object, hierarchical comparison methods are used for identifying and mapping duplicate data blocks referenced by closely related data objects.