Abstract:
A method for managing deduplication reference data may include (1) identifying multiple of data containers configured to store a plurality of deduplicated data segments that are referenced by multiple data objects within a deduplicated data system, (2) maintaining multiple reference databases including (i) a first reference database corresponding to a first subset of the data containers and (ii) a second reference database corresponding to a second subset of the data containers, the second subset differing from the first subset, (3) determining that a data object references at least one segment within a first data container within the first subset but does not reference any data segment within a second data container within the second subset and (4) updating the first reference database with information specifying that the data object references at least one data segment within at least one data container within the first subset of data containers.
Abstract:
The present disclosure provides for implementing a two-level fingerprint caching scheme for a client cache and a server cache. The client cache hit ratio can be improved by pre-populating the client cache with fingerprints that are relevant to the client. Relevant fingerprints include fingerprints used during a recent time period (e.g., fingerprints of segments that are included in the last full backup image and any following incremental backup images created for the client after the last full backup image), and thus are referred to as fingerprints with good temporal locality. Relevant fingerprints also include fingerprints associated with a storage container that has good spatial locality, and thus are referred to as fingerprints with good spatial locality. A pre-set threshold established for the client cache (e.g., threshold Tc) is used to determine whether a storage container (and thus fingerprints associated with the storage container) has good spatial locality.
Abstract:
The present disclosure provides for implementing a two-level fingerprint caching scheme for a client cache and a server cache. The client cache hit ratio can be improved by pre-populating the client cache with fingerprints that are relevant to the client. Relevant fingerprints include fingerprints used during a recent time period (e.g., fingerprints of segments that are included in the last full backup image and any following incremental backup images created for the client after the last full backup image), and thus are referred to as fingerprints with good temporal locality. Relevant fingerprints also include fingerprints associated with a storage container that has good spatial locality, and thus are referred to as fingerprints with good spatial locality. A pre-set threshold established for the client cache (e.g., threshold Tc) is used to determine whether a storage container (and thus fingerprints associated with the storage container) has good spatial locality.