Hash based de-duplication in a storage system
Abstract:
A method for de-duplication, the method may include receiving a request to store in a storage system a received data entity; obtaining a received data entity signature that is responsive to the received data entity; selecting a selected data structure out of a set of data structures that comprises K data structures; wherein K is a positive integer; wherein for each value of a variable k that ranges between 2 and K, a stored data entity signature that is stored in a k'th data structure out of the set collided with stored data entity signatures that are stored in each one of a first till (k−1)'th data structures of the set; calculating an index by applying, on the received data entity signature, a hash function that is associated with the selected data structure; determining whether an entry that is associated with the index and belongs to the selected data structure is empty; writing to the entry, if the entry is empty, the received data entity signature, and storing the received data entity in the storage system in response to a location of the entry in the set; selecting, if (a) the entry is not empty and (b) the received data entity signature differs from a stored data entity signature that is stored in the entry, a new data structure of the set, and repeating at least the stages of calculating and determining.
Public/Granted literature
Information query
Patent Agency Ranking
0/0