Abstract:
An operating state of each of a plurality of storage units of a storage system is periodically monitored, including a storage capacity, a throughput, and overlap of clients associated with the storage units. In response to a request to redistribute data from a first of the storage units to another storage unit, a cost factor for each of remaining storage units to relocate the data of the first storage unit to each of the remaining storage units is determined. A cost factor of each of the remaining storage units is determined based on at least one of the storage capacity, the throughput, or the overlap of clients of the storage unit. A second of the storage units having a lowest cost factor amongst the remaining storage units is selected. At least a portion of the data of the first storage unit is migrated to the second storage unit.
Abstract:
A system for storing data comprises a performance storage unit and a performance segment storage unit. The system further comprises a determiner. The determiner determines whether a requested data is stored in the performance storage unit. The determiner determines whether the requested data is stored in the performance segment storage unit in the event that the requested data is not stored in the performance storage unit.
Abstract:
Cluster storage is disclosed. A data stream or a data block is received. The data stream or the data block is broken into segments. For each segment, a cluster node is selected, and a portion of the segment smaller than the segment is identified that is a duplicate of a portion of a segment already managed by the cluster node.
Abstract:
Transmitting or storing subsegments is disclosed. A data stream or a data block is received and broken into a plurality of segments. For at least one segment, the segment is broken into a plurality of subsegments. A previously stored or transmitted segment similar to the at least one segment is identified. A fingerprint is computed for at least one subsegment. And, using the fingerprint for the at least one subsegment, determining whether the at least one subsegment is identical to a subsegment of the previously stored or transmitted segment without directly comparing the content of the at least one subsegment with the content of the subsegment of the previously stored or transmitted segment.
Abstract:
Transmitting or storing subsegments is disclosed. A data stream or a data block is received and broken into a plurality of segments. For at least one segment, the segment is broken into a plurality of subsegments. A previously stored or transmitted segment similar to the at least one segment is identified. A fingerprint is computed for at least one subsegment. And, using the fingerprint for the at least one subsegment, determining whether the at least one subsegment is identical to a subsegment of the previously stored or transmitted segment without directly comparing the content of the at least one subsegment with the content of the subsegment of the previously stored or transmitted segment.
Abstract:
A system for storing data comprises a performance storage system for storing one or more data items. A data item of the one or more data items comprises a data file or a data block. The system further comprises a segment storage system for storing a snapshot of a stored data item of the one or more data items in the performance storage system. The taking of the snapshot of the stored data item enables recall of the stored data item as stored at a time of the snapshot. At least one newly stored segment is stored as a reference to a previously stored segment.
Abstract:
A system for processing data includes a data storage device and a processor. The data storage device stores a set of data. The processor is configured to divide the set of data in the data storage system into a set of segments; compute a set of fingerprints, wherein the set of fingerprints comprises a fingerprint for each segment of the set of segments; store the set of fingerprints in a new snapshot; identify a second set of fingerprints in the new snapshot that are not already in a fingerprint index; cause a second set of segments associated with the second set of fingerprints to be stored in a backup data storage system; and cause the second set of fingerprints to be added to the fingerprint index.
Abstract:
A system for processing data comprises a deduplicating system, an interface, and a processor. The deduplicating system stores a copy of data stored in a data storage system by storing a set of segments that is able to reconstruct the data stored in the data storage system. The interface receives an indication to revert data stored in the data storage system to a state of data at a snapshot time stored in the deduplicating system. The processor is configured to determine a subset of the data stored in the data storage system that has changed between the data stored in the data storage system and the state of data at the snapshot time stored in the deduplicating system using a first list of fingerprints associated with the data stored on the data storage system and a second list of fingerprints associated with the state of data at the snapshot time stored in the deduplicating system.
Abstract:
A system and method are disclosed for providing efficient data storage. A plurality of data segments is received in a data stream. The system determines whether a data segment has been stored previously in a low latency memory. In the event that the data segment is determined to have been stored previously, an identifier for the previously stored data segment is returned.
Abstract:
Cluster storage comprises an interface and a processor. The interface is to send a tag to a selected node and receive tags from the selected node. The tags received from the selected node comprise tags for likely similar segments stored on the selected node. The processor is to break a segment into subsegments, calculate subsegment tags for each subsegment, identify one or more references to one or more previously stored subsegments and/or one or more segment data using the tags from the selected node and the subsegment tags, and send the one or more references to the one or more previously stored subsegments and/or segment data and associated tags to the selected node.