Abstract:
An embodiment of a method of estimating storage system reliability begins with a first step of modeling a storage system design in operation under a workload to determine location of retrieval points. The retrieval points provide sources for primary storage recovery for a plurality of failure scenarios. The method continues with a second step of finding a most recent retrieval point relative to a target recovery time that is available for recovery for a particular failure scenario. In a third step, a difference between the target recovery time and a retrieval point creation time for the most recent retrieval point is determined. The difference indicates a data loss time period.
Abstract:
Provided is a method for determining a recovery schedule. The method includes accepting as input a recovery graph. The recovery graph presents one or more strategies for data recovery. In addition, at least one objective is provided and accepted. The recovery graph is formalized as an optimization problem for the provided objective. When formalized as an optimization problem, at least one solution technique is applied to determine at least one recovery schedule.
Abstract:
The present invention provides techniques for assignment and layout of redundant data in data storage system. In one aspect, the data storage system stores a number M of replicas of the data. Nodes that have sufficient resources available to accommodate a requirement of data to be assigned to the system are identified. When the number of nodes is greater than M, the data is assigned to M randomly selected nodes from among those identified. The data to be assigned may include a group of data segments and when the number of nodes is less than M, the group is divided to form a group of data segments having a reduced requirement. Nodes are then identified that have sufficient resources available to accommodate the reduced requirement. In other aspects, techniques are providing for adding a new storage device node to a data storage system having a plurality of existing storage device nodes and for removing data from a storage device node in such a data storage system.
Abstract:
An embodiment of a method of operating a distributed storage system includes reading m data blocks from a distributed cache. The distributed cache comprises memory of a plurality of independent computing devices that include redundancy for the m data blocks. The m data blocks and p parity blocks are stored across m plus p independent computing devices. Each of the m plus p independent computing devices stores a single block selected from the m data blocks and the p parity blocks.
Abstract translation:操作分布式存储系统的方法的实施例包括从分布式高速缓存读取m个数据块。 分布式高速缓存包括包含m个数据块的冗余的多个独立计算设备的存储器。 m个数据块和p个奇偶校验块存储在m + p个独立计算设备上。 m + p个独立计算装置中的每一个存储从m个数据块和p个奇偶校验块中选择的单个块。
Abstract:
A computer storage system includes a controller and a storage device array. The storage device array may include a first sub-array and a fast storage device sub-array. The first sub-array includes one or more first storage devices storing data. The fast storage device sub-array includes one or more fast storage devices storing a copy of the data stored in the first sub-array.
Abstract:
An embodiment of a method of caching data writes data units into a write cache for eventual flushing to storage. The method sets a copy-to-read-cache flag for each particular data unit that is read from the write cache. Upon flushing each data unit to the storage, the method copies the data unit to a read cache if the flag for the data unit is set. Another embodiment of a method of caching data writes data units into a write cache. The method simulates a transfer policy for copying the data units from the write cache to a read cache to determine a performance indicator for the transfer policy. Upon flushing each data unit, the method copies the data unit to the read cache if the performance indicator exceeds a threshold and the transfer policy includes copying the data unit into the read cache.
Abstract:
An embodiment of a method of cooperative caching for a distributed storage system begins with a step of requesting data from storage devices which hold the data. The method continues with a step of receiving any cached blocks and expected response times for providing non-cached blocks from the storage devices. The method concludes with a step of requesting a sufficient number of the non-cached blocks from one or more particular storage devices which provides an expectation of optimal performance.
Abstract:
A method and apparatus is used to divide a storage volume into shards. The division is made using a directed graph having a vertex for each block in the storage volume and directed-edges between pairs of vertices representing a shard of blocks, associating a weight with each directed edge that represents the dissimilarity for the shard of blocks between the corresponding pair of vertices, selecting a maximum number of shards (K) for dividing the storage volume, identifying a minimum aggregate weight associated with a current vertex for a combination of no more than K shards, performing the identification of the minimum aggregate weight for vertices in the directed graph, and picking the smallest aggregated weight associated with the last vertex to determine a sharding that spans the storage volume and provides a minimal dissimilarity among no more than K shards of blocks.
Abstract:
Method and apparatus for distributing storage requests referencing a replicated data set to heterogeneous storage arrays. A workload includes related storage requests that have a common quality-of-service requirement. The performance levels of the storage arrays are monitored in processing the storage requests. The performance levels and quality-of-service requirements are used for distributing the storage requests between the storage arrays.
Abstract:
A method of reading data comprises sending read messages to storage devices holding the stripe and receiving at least a quorum of reply messages. The reply message from the storage device holding the data block includes the data block. The quorum meets a quorum condition of a number such that any two selections of the number of stripe blocks intersect in the minimum number of the stripe blocks needed to decode the stripe. A method of writing data comprises sending query messages to storage devices holding the stripe, receiving a query reply message from each of at least a first quorum of the storage devices, sending modify messages to the storage devices, and receiving a write reply message from each of at least a second quorum of the storage devices. The first and second quorums each meet the quorum condition.