Abstract:
A storage server is coupled to a storage device that stores blocks of data, and generates a fingerprint for each data block stored on the storage device. The storage server creates a fingerprints datastore that is divided into a primary datastore and a secondary datastore. The primary datastore comprises a single entry for each unique fingerprint and the secondary datastore comprises an entry having an identical fingerprint as an entry in the primary datastore. The storage server merges entries in a changelog with the entries in the primary datastore to identify duplicate data blocks in the storage device and frees the identified duplicate data blocks in the storage device. The storage server stores the entries that correspond to the freed data blocks to a third datastore and overwrites the primary datastore with the entries from the merged data that correspond to the unique fingerprints to create an updated primary datastore.
Abstract:
The techniques introduced here provide for enabling deduplication operations for a file system without significantly affecting read performance of the file system due to fragmentation of the data sets in the file system. The techniques include determining, by a storage server that hosts the file system, a level of fragmentation that would be introduced to a data set stored in the file system as a result of performing a deduplication operation on the data set. The storage server then compares the level of fragmentation with a threshold value and determines whether to perform the deduplication operation based on a result of comparing the level of fragmentation with the threshold value. The threshold value represents an acceptable level of fragmentation in the data sets of the file system.
Abstract:
One or more techniques and/or computing devices are provided for inline deduplication. For example, a checksum hash table and/or a block number hash table may be maintained within memory (e.g., a storage controller may maintain the hash tables in-core). The checksum hash table may be utilized for inline deduplication to identify potential donor blocks that may comprise the same data as an incoming storage operation. Data within an in-core buffer cache is eligible as potential donor blocks so that inline deduplication may be performed using data from the in-core buffer cache, which may mitigate disk access to underlying storage for which the in- core buffer cache is used for caching. The block number hash table may be used for updating or removing entries from the hash tables, such as for blocks that are no longer eligible as potential donor blocks (e.g., deleted blocks, blocks evicted from the in-core buffer cache, etc.).
Abstract:
A storage server is coupled to a storage device that stores blocks of data, and generates a fingerprint for each data block stored on the storage device. The storage server creates a fingerprints datastore that is divided into a primary datastore and a secondary datastore. The primary datastore comprises a single entry for each unique fingerprint and the secondary datastore comprises an entry having an identical fingerprint as an entry in the primary datastore. The storage server merges entries in a changelog with the entries in the primary datastore to identify duplicate data blocks in the storage device and frees the identified duplicate data blocks in the storage device. The storage server stores the entries that correspond to the freed data blocks to a third datastore and overwrites the primary datastore with the entries from the merged data that correspond to the unique fingerprints to create an updated primary datastore.
Abstract:
The techniques introduced here provide for enabling deduplication operations for a file system without significantly affecting read performance of the file system due to fragmentation of the data sets in the file system. The techniques include determining, by a storage server that hosts the file system, a level of fragmentation that would be introduced to a data set stored in the file system as a result of performing a deduplication operation on the data set. The storage server then compares the level of fragmentation with a threshold value and determines whether to perform the deduplication operation based on a result of comparing the level of fragmentation with the threshold value. The threshold value represents an acceptable level of fragmentation in the data sets of the file system.