Invention Grant
- Patent Title: Selection of digest hash function for different data sets
-
Application No.: US16381303Application Date: 2019-04-11
-
Publication No.: US11308036B2Publication Date: 2022-04-19
- Inventor: Istvan Gonczi , Ivan Bassov , Sorin Faibish
- Applicant: EMC IP Holding Company LLC
- Applicant Address: US MA Hopkinton
- Assignee: EMC IP Holding Company LLC
- Current Assignee: EMC IP Holding Company LLC
- Current Assignee Address: US MA Hopkinton
- Agency: Muirhead and Saturnelli, LLC
- Main IPC: G06F16/00
- IPC: G06F16/00 ; G06F16/174 ; G06F16/13 ; G06F17/18 ; H04L9/06

Abstract:
Techniques for processing data may include: receiving a plurality of data chunks for a data set; performing data deduplication processing for the plurality of data chunks; determining, in accordance with one or more criteria, whether a frequency distribution of a frequency histogram of digest byte frequencies is sufficiently uniform; and responsive to determining that the frequency distribution of the frequency histogram is not sufficiently uniform, performing processing to update data deduplication settings for the data set. Updating the data deduplication settings may include using a stronger hash algorithm and/or a larger size digest when generating subsequent digests. The data deduplication processing may include: determining, using a current hash algorithm, a plurality of digests for the plurality of data chunks of the data set; and updating the frequency histogram of digest byte frequencies for the data set in accordance the plurality of digests.
Public/Granted literature
- US20200327098A1 SELECTION OF DIGEST HASH FUNCTION FOR DIFFERENT DATA SETS Public/Granted day:2020-10-15
Information query