Invention Grant
- Patent Title: Web-scale distributed deduplication
-
Application No.: US14876579Application Date: 2015-10-06
-
Publication No.: US10540328B1Publication Date: 2020-01-21
- Inventor: Hariprasad Bhasker Rao Mankude
- Applicant: Cohesity, Inc.
- Applicant Address: US CA San Jose
- Assignee: Cohesity, Inc.
- Current Assignee: Cohesity, Inc.
- Current Assignee Address: US CA San Jose
- Agency: Van Pelt, Yi & James LLP
- Main IPC: G06F7/00
- IPC: G06F7/00 ; G06F17/30 ; G06F16/174 ; G06F16/182

Abstract:
Approaches for parallelized data deduplication. An instruction to perform data deduplication on a plurality of files is received. The plurality of files is organized into two or more work sets that each correspond to a subset of the plurality of files. Responsibility for performing each of said two or more work sets is assigned to a set of nodes in a cluster of nodes. The nodes may be physical nodes or virtual nodes. Each node in the set performs data deduplication on a different work set. In performing data deduplication, each node may store metadata describing where shared chunks of data are maintained in a distributed file system. The shared chunks of data are two or more sequences of bytes which appear in two or more of said plurality of files.
Information query