Invention Grant
- Patent Title: Progressive sampling for deduplication indexing
- Patent Title (中): 重复数据删除索引的逐行抽样
-
Application No.: US12617426Application Date: 2009-11-12
-
Publication No.: US08311964B1Publication Date: 2012-11-13
- Inventor: Petros Efstathopoulos , Fanglu Guo , Dharmesh Shah
- Applicant: Petros Efstathopoulos , Fanglu Guo , Dharmesh Shah
- Applicant Address: US CA Mountain View
- Assignee: Symantec Corporation
- Current Assignee: Symantec Corporation
- Current Assignee Address: US CA Mountain View
- Agency: Meyertons, Hood, Kivlin, Kowert & Goetzel, P.C.
- Agent Rory D. Rankin
- Main IPC: G06F17/00
- IPC: G06F17/00 ; G06N5/00

Abstract:
A system and method for efficiently reducing a number of duplicate blocks of stored data. A file server both removes duplicate data and prevents duplicate data from being stored in the shared storage. A sampling rate may be used to determine which fingerprints, or hash values, are stored in an index. The sampling rate may be modified in response to changes in characteristics of the system, such as a change in the shared storage size, a change in a utilization of the shared storage, a change in the size of the storage unit, and reaching a threshold corresponding to utilization of the index. Also, a small cache may be maintained for holding fingerprint and pointer pair values prefetched from the shared storage. Each prefetched pair may be associated with data corresponding to a previous hit in the index. The association may be related to spatial locality, temporal locality, or otherwise.
Information query