System and method for data deduplication using log-structured merge trees
Abstract:
Disclosed are systems, methods and computer program products for data deduplication during a backup using at least two LSM trees. An example method includes calculating, for a first data block, a first hash value associated with the first data block and determining a reduced hash value based on the first hash value. The method includes determining whether the first data block contains data duplicative of an existing data block in a prior backup based on whether the reduced hash value occurs in a first log-structured merge (LSM) tree. If so, the method includes comparing the first hash value to one or more hash values in a second LSM tree to identify a matching hash value, and writing a first segment identifier (ID) corresponding to the matching hash value in an archive, the first segment ID referencing the existing data block in a segment store.
Information query
Patent Agency Ranking
0/0