Abstract:
A system and method for calculating and storing block fingerprints for data dedu- plication. A fingerprint extraction layer generates a fingerprint of a predefined size, e.g., 64 bits, for each data block stored by a storage system. Each fingerprint is stored in a fingerprint record, and the fingerprint records are, in turn, stored in a fingerprint database for access by the data deduplication module. The data deduplication module may peri¬ odically compare the fingerprints to identify duplicate fingerprints, which, in turn, indi¬ cate duplicate data blocks.
Abstract:
A system and method provides continuous data protection using checkpoints in a write anywhere file system. During a consistency point of a write anywhere file system, freed blocks are identified and are appended to a delete log for retention. A consistency point log is updated with a new entry associated with the consistency point. If the file system needs to retrieve its state at a particular point in time, the stored blocks of the delete log may be recovered.
Abstract:
A technique is disclosed for restoring data of sparse volumes, where one or more block pointers within the file system structure are marked as ABSENT, and fetching the appropriate data from an alternate location on demand. Client data access requests to the local storage system initiate a restoration of the data from a backing store as required. A demand generator can also be used to restore the data as a background process by walking through the sparse volume and restoring the data of absent blocks. A pump module is also disclosed to regulate the access of the demand generator. Once all the data has been restored, the volume contains all data locally, and is no longer a sparse volume.
Abstract:
A technique for eliminating duplicate data is provided. Upon receipt of a new data set, one or more anchor points are identified within the data set. A bit-by-bit data comparison is then performed of the region surrounding the anchor point in the received data set with the region surrounding an anchor point stored within a pattern database to identify forward/backward delta values. The duplicate data identified by the anchor point, forward and backward delta values is then replaced in the received data set with a storage indicator.
Abstract:
A system and method for managing data deduplication of a storage system utilizing persistent consistency point images (PCPIs). Once a target PCPI of a data transfer is generated, a backup management module of the storage system alerts a data deduplication module to begin deduplication of the data contained within the target PCPI. Once the deduplication procedure has been completed, the active file system of the storage system has been deduplicated, however, the target PCPI remains un-deduplicated. In response, the backup management module generates and exports a revised target PCPI. The previous target PCPI may then be deleted, thereby transitioning the exported PCPI's image of the state of the file system to a deduplicated state.