Digest based data matching in similarity based deduplication
Abstract:
Data matches are calculated between input data and repository data via a digest based matching algorithm where the reference digests corresponding to a repository interval of data identified as similar to an input interval of data are loaded into a sequential array and into a search structure. Each of the matching digests found using the search structure are extended using the sequential array of reference digests. Repository data intervals are determined as similar to an input data interval. Reference digests corresponding to the similar repository data interval are loaded into a sequential representation and into a search structure. Matches of input digests and the reference digests are found using the search structure. Each one of the found matches of the input digests and repository digests are extended using the sequential representation. Data matches are determined between the input data and the repository data using extended matches of digests.
Public/Granted literature
Information query
Patent Agency Ranking
0/0