Abstract:
A computer-implemented method for caching content in a cache memory device is disclosed. The method starts with receiving a request for accessing a first data block associated with a first file, and a file manager provides access of the first data block in a persistent storage device of a storage system. The file manager then caches the first data block in a cache memory device including deduplicating the first data block, wherein at least some of data blocks stored in the cache memory device are deduplicated data blocks, and wherein at least one of the data blocks is referenced by different regions of an identical file or different files.
Abstract:
A computer-implemented method for indexing content stored in a cache memory device is disclosed. The method starts with in response to receiving a first request for caching a first file extent associated with a first file in a cache memory device, generating a first fingerprint based on content of the first file extent. Then the method continues with searching in a fingerprint index based on the first fingerprint to determine whether the first file extent has been stored in the cache memory. In response to determining that a fingerprint entry matching the first fingerprint is found, the method then continues with associating a first identifier identifying the first file extent and the first file with a storage location of the cache memory device obtained from the matching fingerprint entry, without storing the first file extent in the cache memory device.
Abstract:
In response to a request from a client to store a data block in a storage system, the data block is segmented into a plurality of subblocks. Each of the plurality of subblocks is individually compressed into a compressed subblock. The compressed subblocks are packed into a compressed data block. The compressed data block having the individually compressed subblocks therein is stored in a persistent storage device. Metadata of the compressed data block is stored in an index entry in an index of the storage system, including storing subblock locators indicating locations of the compressed subblocks. Each of the subblocks can be individually accessed based on a corresponding subblock locator without having to access remaining subblocks.
Abstract:
In response to a request for stored data, retrieving an index entry, based on an identifier of the requested data, corresponding to an indexed block of storage containing the requested data. The index entry includes a start location of an indexed storage block and sub-block locators that identify the start of one or more sub-blocks within the indexed storage block. The sub-block containing the requested data is determined and the corresponding sub-block locator is read to find the starting location of the sub-block. Without reading the entire indexed storage block, the sub-block may be read from the starting location of the sub-block, decompressed, and the decompressed requested data read from the sub-block may be transmitted to the client. In this way, fewer I/O operations are needed that read the requested data, and memory needed for storing index information is minimized.
Abstract:
Techniques for improving data compression of a deduplicated storage system are described herein. According to one embodiment, the similarity of a plurality of data chunks stored in one or more first storage areas of the storage system is determined based on a plurality of sketches, each describing characteristics of one of the data chunks. The data chunks are grouped into a plurality of groups of similar data chunks based on the similarity of the data chunks. The groups of similar data chunks are compressed, such that similar data chunks are compressed close to each other.
Abstract:
A computer-implemented method for caching content in a cache memory device is disclosed. The method starts with receiving, at a cache manager, one or more data chunks to be cached in a cache memory device, where the one or more data chunks are retrieved from a persistent storage disk of a storage system in response to a read request of a region of a file. Then the one or more data chunks of a file extent is compressed using a predetermined compression algorithm, and the file extent is packed into a write-evict unit (WEU) maintained in a random-access memory (RAM) that has been open to store a plurality of file extents. In response to determining that the WEU is full, the cache manager writes the WEU from the RAM into the cache memory device.
Abstract:
A system for directing for storage comprises a processor and a memory. The processor is configured to determine a segment overlap for each of a plurality of nodes. The processor is further configured to determine a selected node of the plurality of nodes based at least in part on the segment overlap for each of the plurality of nodes and based at least in part on a selection criteria. The memory is coupled to the processor and configured to provide the processor with instructions.
Abstract:
A computer-implemented method for indexing content stored in a cache memory device is disclosed. The method starts with maintaining a fingerprint index having a plurality of fingerprint entries, each mapping a fingerprint to a storage location of a cache memory device, where the cache memory device caches some of data blocks stored in a persistent storage device of a storage system, and where the fingerprint index is a partial index indexing a portion of data stored in the cache memory device. In response to receiving a request to insert a new fingerprint, the method continues with evicting one of the fingerprint entries according to a predetermined eviction algorithm and inserting the new fingerprint into the evicted fingerprint entry.
Abstract:
Techniques for improving data compression of a storage system are described herein. According to one embodiment, a first sequence of data is partitioned into a plurality of data chunks in a first sequence order according to a predetermined chunking algorithm. The similarity of the data chunks is determined based on data patterns of the data chunks. The data chunks are reorganized into a second sequence order based on the similarity of the data chunks, the second sequence order being different from the first sequence order. The reorganized data chunks are compressed in the second sequence order into a second sequence of data, such that similar data chunks are stored and compressed together within the second sequence of data.