Abstract:
The subject disclosure is directed towards using primary data deduplication concepts for more efficient access of data via content addressable caches. Chunks of data, such as deduplicated data chunks, are maintained in a fast access client-side cache, such as containing chunks based upon access patterns. The chunked content is content addressable via a hash or other unique identifier of that content in the system. When a chunk is needed, the client-side cache (or caches) is checked for the chunk before going to a file server for the chunk. The file server may likewise maintain content addressable (chunk) caches. Also described are cache maintenance, management and organization, including pre-populating caches with chunks, as well as using RAM and/or solid-state storage device caches.
Abstract:
The subject disclosure is directed towards using primary data deduplication concepts for more efficient access of data via content addressable caches. Chunks of data, such as deduplicated data chunks, are maintained in a fast access client-side cache, such as containing chunks based upon access patterns. The chunked content is content addressable via a hash or other unique identifier of that content in the system. When a chunk is needed, the client-side cache (or caches) is checked for the chunk before going to a file server for the chunk. The file server may likewise maintain content addressable (chunk) caches. Also described are cache maintenance, management and organization, including pre-populating caches with chunks, as well as using RAM and/or solid-state storage device caches.
Abstract:
The subject disclosure is directed towards partially recalling file ranges of deduplicated files based on tracking dirty (write modified) ranges (user writes) in a way that eliminates or minimizes reading and writing already-optimized adjacent data. The granularity of the ranges does not depend on any file-system granularity for tracking ranges. In one aspect, lazy flushing of tracking data that preserves data-integrity and crash-consistency is provided. In one aspect, also described is supporting granular partial recall on an open file while a data deduplication system is optimizing that file.
Abstract:
The subject disclosure is directed towards partially recalling file ranges of deduplicated files based on tracking dirty (write modified) ranges (user writes) in a way that eliminates or minimizes reading and writing already-optimized adjacent data. The granularity of the ranges does not depend on any file-system granularity for tracking ranges. In one aspect, lazy flushing of tracking data that preserves data-integrity and crash-consistency is provided. In one aspect, also described is supporting granular partial recall on an open file while a data deduplication system is optimizing that file.
Abstract:
The subject disclosure is directed towards a data storage service that uses hash values, such as substantially collision-free hash values, to maintain data integrity. These hash values are persisted in the form of mappings corresponding to data blocks in one or more data stores. If a data error is detected, these mappings allow the data storage service to search the one or more data stores for data blocks having matching hash values. If a data block is found that corresponds to a hash value for a corrupted or lost data block, the data storage service uses that data block to repair the corrupted or lost data block.
Abstract:
The subject disclosure is directed towards a data storage service that uses hash values, such as substantially collision-free hash values, to maintain data integrity. These hash values are persisted in the form of mappings corresponding to data blocks in one or more data stores. If a data error is detected, these mappings allow the data storage service to search the one or more data stores for data blocks having matching hash values. If a data block is found that corresponds to a hash value for a corrupted or lost data block, the data storage service uses that data block to repair the corrupted or lost data block.