Invention Grant
- Patent Title: Detecting data deduplication opportunities using hash distance
-
Application No.: US16392964Application Date: 2019-04-24
-
Publication No.: US11112985B2Publication Date: 2021-09-07
- Inventor: Ivan Bassov , Philippe Armangau , Sorin Faibish , Istvan Gonczi
- Applicant: EMC IP Holding Company LLC
- Applicant Address: US MA Hopkinton
- Assignee: EMC IP Holding Company LLC
- Current Assignee: EMC IP Holding Company LLC
- Current Assignee Address: US MA Hopkinton
- Agency: Muirhead and Saturnelli, LLC
- Main IPC: G06F3/06
- IPC: G06F3/06 ; H04L9/06 ; G06F11/10

Abstract:
Techniques for processing data may include: receiving a candidate data block; computing a distance using a distance function, wherein the distance denotes a measurement of similarity between the candidate data block and a target data block; and determining, using the distance, whether to perform data deduplication of the candidate data block with respect to the target data block to identify at least one sub-block of the candidate data block that is a duplicate of at least one sub-block of the target data block. The distance may be computed using a bit-wise logical exclusive-or operation of the contents of the candidate data block and the target data block. The distance may be computed using a bit-wise logical exclusive-or operation of digests computed for the candidate and target data blocks using a distance preserving hash function. The target and candidate block may be similar if the distance is less than a threshold.
Public/Granted literature
- US20200341666A1 DETECTING DATA DEDUPLICATION OPPORTUNITIES USING HASH DISTANCE Public/Granted day:2020-10-29
Information query