Progressive sampling for deduplication indexing

Invention Grant

US08311964B1 Progressive sampling for deduplication indexing 有权

Title translation: 重复数据删除索引的逐行抽样

Please log in to see more content

Patent Title: Progressive sampling for deduplication indexing
Patent Title (中): 重复数据删除索引的逐行抽样
Application No.: US12617426

Application Date: 2009-11-12
Publication No.: US08311964B1

Publication Date: 2012-11-13
Inventor: Petros Efstathopoulos , Fanglu Guo , Dharmesh Shah
Applicant: Petros Efstathopoulos , Fanglu Guo , Dharmesh Shah
Applicant Address: US CA Mountain View
Assignee: Symantec Corporation
Current Assignee: Symantec Corporation
Current Assignee Address: US CA Mountain View
Agency: Meyertons, Hood, Kivlin, Kowert & Goetzel, P.C.
Agent Rory D. Rankin
Main IPC: G06F17/00
IPC: G06F17/00 ; G06N5/00

Progressive sampling for deduplication indexing

Abstract:

A system and method for efficiently reducing a number of duplicate blocks of stored data. A file server both removes duplicate data and prevents duplicate data from being stored in the shared storage. A sampling rate may be used to determine which fingerprints, or hash values, are stored in an index. The sampling rate may be modified in response to changes in characteristics of the system, such as a change in the shared storage size, a change in a utilization of the shared storage, a change in the size of the storage unit, and reaching a threshold corresponding to utilization of the index. Also, a small cache may be maintained for holding fingerprint and pointer pair values prefetched from the shared storage. Each prefetched pair may be associated with data corresponding to a previous hit in the index. The association may be related to spatial locality, temporal locality, or otherwise.

Abstract(Chinese):

一种用于有效地减少存储数据的多个重复块的系统和方法。文件服务器同时删除重复数据，并防止重复数据存储在共享存储中。可以使用采样率来确定哪些指纹或散列值存储在索引中。可以响应于系统特性的变化来修改采样率，例如共享存储大小的变化，共享存储器的利用率的改变，存储单元的大小的变化以及达到阈值对应于索引的利用。此外，可以维护小的缓存以保持从共享存储器预取的指纹和指针对值。每个预取对可以与对应于索引中的先前命中的数据相关联。该关联可能与空间局部性，时间局部性或其他方面有关。

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F17/00	特别适用于特定功能的数字计算设备或数据处理设备或数据处理方法（信息检索，数据库结构或文件系统结构，G06F 16/00）