Digest based data matching in similarity based deduplication

Invention Grant

US10296598B2 Digest based data matching in similarity based deduplication 有权

Please log in to see more content

Patent Title: Digest based data matching in similarity based deduplication
Application No.: US13941694

Application Date: 2013-07-15
Publication No.: US10296598B2

Publication Date: 2019-05-21
Inventor: Lior Aronovich
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Applicant Address: US NY Armonk
Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
Current Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
Current Assignee Address: US NY Armonk
Agency: Griffiths & Seaton PLLC
Main IPC: G06F17/30
IPC: G06F17/30

Digest based data matching in similarity based deduplication

Abstract:

Data matches are calculated between input data and repository data via a digest based matching algorithm where the reference digests corresponding to a repository interval of data identified as similar to an input interval of data are loaded into a sequential array and into a search structure. Each of the matching digests found using the search structure are extended using the sequential array of reference digests. Repository data intervals are determined as similar to an input data interval. Reference digests corresponding to the similar repository data interval are loaded into a sequential representation and into a search structure. Matches of input digests and the reference digests are found using the search structure. Each one of the found matches of the input digests and repository digests are extended using the sequential representation. Data matches are determined between the input data and the repository data using extended matches of digests.

Public/Granted literature

US20150019499A1 DIGEST BASED DATA MATCHING IN SIMILARITY BASED DEDUPLICATION Public/Granted day:2015-01-15

Information query

Espacenet