Detecting data deduplication opportunities using hash distance

Invention Grant

US11112985B2 Detecting data deduplication opportunities using hash distance 有权

Please log in to see more content

Patent Title: Detecting data deduplication opportunities using hash distance
Application No.: US16392964

Application Date: 2019-04-24
Publication No.: US11112985B2

Publication Date: 2021-09-07
Inventor: Ivan Bassov , Philippe Armangau , Sorin Faibish , Istvan Gonczi
Applicant: EMC IP Holding Company LLC
Applicant Address: US MA Hopkinton
Assignee: EMC IP Holding Company LLC
Current Assignee: EMC IP Holding Company LLC
Current Assignee Address: US MA Hopkinton
Agency: Muirhead and Saturnelli, LLC
Main IPC: G06F3/06
IPC: G06F3/06 ; H04L9/06 ; G06F11/10

Detecting data deduplication opportunities using hash distance

Abstract:

Techniques for processing data may include: receiving a candidate data block; computing a distance using a distance function, wherein the distance denotes a measurement of similarity between the candidate data block and a target data block; and determining, using the distance, whether to perform data deduplication of the candidate data block with respect to the target data block to identify at least one sub-block of the candidate data block that is a duplicate of at least one sub-block of the target data block. The distance may be computed using a bit-wise logical exclusive-or operation of the contents of the candidate data block and the target data block. The distance may be computed using a bit-wise logical exclusive-or operation of digests computed for the candidate and target data blocks using a distance preserving hash function. The target and candidate block may be similar if the distance is less than a threshold.

Public/Granted literature

US20200341666A1 DETECTING DATA DEDUPLICATION OPPORTUNITIES USING HASH DISTANCE Public/Granted day:2020-10-29

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F3/00	用于将所要处理的数据转变成为计算机能够处理的形式的输入装置；用于将数据从处理机传送到输出设备的输出装置，例如，接口装置
G06F3/06	.来自记录载体的数字输入，或者到记录载体上去的数字输出