Disk-image deduplication with hash subset in memory

Invention Grant

US10552075B2 Disk-image deduplication with hash subset in memory 有权

Please log in to see more content

Patent Title: Disk-image deduplication with hash subset in memory
Application No.: US15877566

Application Date: 2018-01-23
Publication No.: US10552075B2

Publication Date: 2020-02-04
Inventor: Oleg Zaydman
Applicant: VMWARE, INC.
Applicant Address: US CA Palo Alto
Assignee: VMware, Inc.
Current Assignee: VMware, Inc.
Current Assignee Address: US CA Palo Alto
Agency: Fish & Richardson P.C.
Main IPC: G06F3/06
IPC: G06F3/06 ; G06F16/11 ; G06F9/455

Disk-image deduplication with hash subset in memory

Abstract:

Deduplication of virtual-machine disk images and other disk images can involve identifying the first clusters in a file. The clusters are hashed. The first-in-file hashes (generated from first-in-file clusters) are stored in an in-memory index, while the full set of hashes is streamed in order to find matches with the hashes stored in the in-memory index. First-in-file hashes in the stream are compared, while other hashes in the stream are compared only if the immediately preceding hash resulted in a match. Comparing non-first-in-file hashes requires disk accesses, but since such comparisons are conditioned on first-in-file matches, there are relatively likely to result in sequences of matches. The net effect is a relatively fast deduplication with compression approaching that resulting from a full comparison of all hashes.

Public/Granted literature

US20190227726A1 DISK-IMAGE DEDUPLICATION WITH HASH SUBSET IN MEMORY Public/Granted day:2019-07-25

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F3/00	用于将所要处理的数据转变成为计算机能够处理的形式的输入装置；用于将数据从处理机传送到输出设备的输出装置，例如，接口装置
G06F3/06	.来自记录载体的数字输入，或者到记录载体上去的数字输出