Abstract:
A rebuild node of a storage system can assess risk of the storage system not being able to provide a data object. The rebuild node(s) uses information about data object fragments to determine health of a data object, which relates to the risk assessment. The rebuild node obtains object fragment information from nodes throughout the storage system. With the object fragment information, the rebuild node(s) can assess object risk based, at least in part, on the object fragments indicated as existing by the nodes. To assess object risk, the rebuild node(s) treats absent object fragments (i.e., those for which an indication was not received) as lost. When too many object fragments are lost, an object cannot be rebuilt. The erasure coding technique dictates the threshold number of fragments for rebuilding an object. The risk assessment per object influences rebuild of the objects.
Abstract:
A rebuild node of a storage system can assess risk of the storage system not being able to provide a data object. The rebuild node(s) uses information about data object fragments to determine health of a data object, which relates to the risk assessment. The rebuild node obtains object fragment information from nodes throughout the storage system. With the object fragment information, the rebuild node(s) can assess object risk based, at least in part, on the object fragments indicated as existing by the nodes. To assess object risk, the rebuild node(s) treats absent object fragments (i.e., those for which an indication was not received) as lost. When too many object fragments are lost, an object cannot be rebuilt. The erasure coding technique dictates the threshold number of fragments for rebuilding an object. The risk assessment per object influences rebuild of the objects.
Abstract:
A dynamic caching technique adaptively controls copies of data blocks stored within caches (“cached copies”) of a caching layer distributed among servers of a distributed data processing system. A cache coordinator of the distributed system implements the dynamic caching technique to increase the cached copies of the data blocks to improve processing performance of the servers. Alternatively, the technique may decrease the cached copies to reduce storage capacity of the servers. The technique may increase the cached copies when it detects local and/or remote cache bottleneck conditions at the servers, a data popularity condition at the servers, or a shared storage bottleneck condition at the storage system. Otherwise, the technique may decrease the cached copies at the servers.
Abstract:
A dynamic caching technique adaptively controls copies of data blocks stored within caches (“cached copies”) of a caching layer distributed among servers of a distributed data processing system. A cache coordinator of the distributed system implements the dynamic caching technique to increase the cached copies of the data blocks to improve processing performance of the servers. Alternatively, the technique may decrease the cached copies to reduce storage capacity of the servers. The technique may increase the cached copies when it detects local and/or remote cache bottleneck conditions at the servers, a data popularity condition at the servers, or a shared storage bottleneck condition at the storage system. Otherwise, the technique may decrease the cached copies at the servers.
Abstract:
A method of performing a global deduplication may include: collecting a data chunk to be written to a backing storage of a storage system at a staging area in the storage system; generating a data fingerprint of the data chunk; sending the data fingerprint in batch along with other data fingerprints corresponding to data chunks collected at different times to a metadata server system in the storage system; receiving an indication, at the staging area, of whether the data fingerprint is unique in the storage system from the metadata server system; and discarding the data chunk when committing a data object containing the data chunk to the backing storage, when the indication indicates that the data chunk is not unique.
Abstract:
A method, non-transitory computer readable medium, and system node computing device that generates a snapshot identifier and returns the snapshot identifier in response to a received request to create a snapshot of a No SQL database. When an entry in a transaction table has a first transaction value corresponding to a transaction that has been committed and a second transaction value that is not assigned or corresponds to another transaction that has not been committed, is determined. The snapshot identifier is inserted into the entry when the entry is determined to have the first transaction value corresponding to the transaction that has been committed and the second transaction value that is not assigned or corresponds to the another transaction that has not been committed.
Abstract:
A method, non-transitory computer readable medium, and system node computing device that facilitate a NoSQL datastore with integrated management. In some embodiments, this technology provides a fast, highly available, and application integrated NoSQL database that can be established in a data storage network such that various data management policies are automatically implemented. This technology enables application administrators to more effectively leverage NoSQL databases by storing data in tables located on storage nodes in groups and zones that have associated SLCs, as previously established upon creation of the tables or an associated entity group or database. Accordingly, management of the data is relatively integrated and data tiering can be more efficiently implemented. This technology also provides a highly scalable infrastructure that can add capacity having predictable and established service levels dynamically and that optimizes the storage of data on types of media having different characteristics in order to provide cost-effective storage.