Abstract:
A system for directing for storage comprises a processor and a memory. The processor is configured to determine a segment overlap for each of a plurality of nodes. The processor is further configured to determine a selected node of the plurality of nodes based at least in part on the segment overlap for each of the plurality of nodes and based at least in part on a selection criteria. The memory is coupled to the processor and configured to provide the processor with instructions.
Abstract:
Techniques for balancing data compression and read performance of data chunks of a storage system are described herein. According to one embodiment, similar data chunks are identified based on sketches of a plurality of data chunks stored in the storage system. A first portion of the similar data chunks as a first group is associated with a first storage area. The first storage area is associated with one or more data chunks that are dissimilar to the first group but are likely accessed together. The first group of the similar data chunks and its associated dissimilar data chunks are compressed and stored in the first storage area.
Abstract:
One embodiment provides a document management system comprising a storage system to store one or more encrypted documents, at least a first portion of a first encrypted document encrypted using a first encryption key, and an encryption key manager to manage a set of encryption keys for the documents on the storage system, the encryption key manager further to discard the first encryption key to provide secure removal of the portion of the encrypted document.
Abstract:
Techniques for improving data compression of a storage system in an online manner are described herein. According to one embodiment, in response to a sequence of data to be stored, the sequence of data is partitioned into a plurality of data chunks according to a predetermined chunking algorithm. A sketch for each of the data chunks is generated based on one or more features extracted from the data chunk. Each of the data chunks of the sequence of data is associated with one of a plurality of groups based on the sketch, wherein each group is represented by a sketch. The data chunks of each group are compressed and stored in a compression region of the storage systems, such that similar data chunks are compressed and stored in the same compression region.
Abstract:
Techniques for improving data compression of a storage system in an online manner are described herein. According to one embodiment, in response to a sequence of data to be stored, the sequence of data is partitioned into a plurality of data chunks according to a predetermined chunking algorithm. A sketch for each of the data chunks is generated based on one or more features extracted from the data chunk. Each of the data chunks of the sequence of data is associated with one of a plurality of groups based on the sketch, wherein each group is represented by a sketch. The data chunks of each group are compressed and stored in a compression region of the storage systems, such that similar data chunks are compressed and stored in the same compression region.
Abstract:
A computer-implemented method for indexing content stored in a cache memory device is disclosed. The method starts with maintaining a file index having a plurality of extent entries, each extent entry corresponding to one of a plurality of file extents stored in a cache memory device that caches data stored in a persistent storage device of a storage system. In response to receiving a request to read a first file region of a first file, the method continues with retrieving first data block from the persistent storage device that contains the first data block, caching the first data block at a first storage location of the cache memory device; and creating a first extent entry in the file index having at least a first node, where the first node includes an address of the first storage location and a first bitmap indicating which data blocks are valid.
Abstract:
An operating state of each of a plurality of storage units of a storage system is periodically monitored, including a storage capacity, a throughput, and overlap of clients associated with the storage units. In response to a request to redistribute data from a first of the storage units to another storage unit, a cost factor for each of remaining storage units to relocate the data of the first storage unit to each of the remaining storage units is determined. A cost factor of each of the remaining storage units is determined based on at least one of the storage capacity, the throughput, or the overlap of clients of the storage unit. A second of the storage units having a lowest cost factor amongst the remaining storage units is selected. At least a portion of the data of the first storage unit is migrated to the second storage unit.
Abstract:
Techniques for reducing and discouraging sending large scale emails are described herein. According to one embodiment, in response to a first email received from a sender to be sent to a list of recipients, a distribution cost of the first email is determined based on content of the first email and the recipients. An email client application is to present a first graphical user interface (GUI) page to the sender prompting a confirmation from the sender, where the first GUI page includes information indicating a size of the first email and a number of recipients, if the distribution cost of the first email is above a first predetermined threshold. In response to a positive confirmation from the sender, the first email is sent to the intended recipient.
Abstract:
Techniques for determining vulnerability of disks are described herein. According to one embodiment, for each of a plurality of disks representing a redundant array of independent disks (RAID), a reallocated sector count associated with the disk is obtained, the reallocated sector count representing a number of sectors that have been reallocated due to an error of a storage transaction to the disk. A failure probability of the disk given the obtained reallocated sector count is determined using a predictive model, wherein the predictive model was generated based on history operating data of a set of known disks. Thereafter, a failure probability of at least two of the disks in the RAID is determined based on the failure probability of each of the disks to determine vulnerability of the RAID.
Abstract:
A computer-implemented method for indexing content stored in a cache memory device is disclosed. The method starts with maintaining a file index having a plurality of extent entries, each extent entry corresponding to one of a plurality of file extents stored in a cache memory device that caches data stored in a persistent storage device of a storage system. The method continues with maintaining a fingerprint index having a plurality of fingerprint entries, each mapping a fingerprint to a data region of a file indexed in the file index, wherein each fingerprint indexed in the fingerprint index is retrieved from metadata stored in the persistent storage device of the storage system when one or more corresponding data chunks were accessed, and deduplicating and accessing the file extents stored in the cache memory device using the file index and the fingerprint index.