Abstract:
In some embodiments, a method for die-level monitoring is provided. The method includes distributing user data throughout a plurality of storage nodes through erasure coding, wherein the plurality of storage nodes are housed within a chassis that couples the storage nodes. Each of the storage nodes has a non-volatile solid-state storage with non-volatile memory and the user data is accessible via the erasure coding from a remainder of the storage nodes in event of two of the storage nodes being unreachable. The method includes producing diagnostic information that diagnoses the non-volatile memory on a basis of per package, per die, per plane, per block, or per page, the producing performed by each of the plurality of storage nodes. The method includes writing the diagnostic information to a memory in the storage cluster.
Abstract:
A method for processing blocks of flash memory to decrease raw bit errors from the flash memory is provided. The method includes identifying one or more blocks of the flash memory for a refresh operation and writing information regarding the identified blocks, to a data structure. The method includes issuing background reads to the identified blocks, according to the data structure, as the refresh operation. The method may be embodied on a computer readable medium. In some embodiments the background reads may be based on a time based refresh responsive to an increase in raw bit error count in the flash memory over time.
Abstract:
A storage cluster is provided. The storage cluster includes a plurality of storage nodes, each of the plurality of storage nodes having nonvolatile solid-state memory and a plurality of operations queues coupled to the solid-state memory. The plurality of storage nodes is configured to distribute the user data and metadata throughout the plurality of storage nodes such that the plurality of storage nodes can access the user data with a failure of two of the plurality of storage nodes. Each of the plurality of storage nodes is configured to determine whether a read of 1 or more bits in the solid-state memory via a first path is within a latency budget. The plurality of storage nodes is configured to perform a read of user data or metadata via a second path, responsive to a determination that the read of the bit via the first path is not within the latency budget.
Abstract:
A plurality of storage nodes within a single chassis is provided. The plurality of storage nodes is configured to communicate together as a storage cluster. The plurality of storage nodes has a non-volatile solid-state storage for user data storage. The plurality of storage nodes is configured to distribute the user data and metadata associated with the user data throughout the plurality of storage nodes, with erasure coding of the user data. The plurality of storage nodes is configured to recover from failure of two of the plurality of storage nodes by applying the erasure coding to the user data from a remainder of the plurality of storage nodes. The plurality of storage nodes is configured to detect an error and engage in an error recovery via one of a processor of one of the plurality of storage nodes, a processor of the non-volatile solid state storage, or the flash memory.
Abstract:
A plurality of storage nodes within a single chassis is provided. The plurality of storage nodes is configured to communicate together as a storage cluster. The plurality of storage nodes has a non-volatile solid-state storage for user data storage. The plurality of storage nodes is configured to distribute the user data and metadata associated with the user data throughout the plurality of storage nodes, with erasure coding of the user data. The plurality of storage nodes is configured to recover from failure of two of the plurality of storage nodes by applying the erasure coding to the user data from a remainder of the plurality of storage nodes. The plurality of storage nodes is configured to detect an error and engage in an error recovery via one of a processor of one of the plurality of storage nodes, a processor of the non-volatile solid state storage, or the flash memory.
Abstract:
A storage system is provided. The storage system include a primary storage node that includes a primary processing device. The primary storage node is communicatively coupled to a secondary storage node. The secondary storage node includes a secondary processing device and a set of non-volatile memory modules. The primary processing device is to identify one or more storage operations to be performed on the set of non-volatile memory modules of the secondary storage node and transmit one or more instructions to the secondary storage node to perform the one or more storage operations, the one or more storage operations performed by the secondary processing device.
Abstract:
A storage cluster with disaggregated compute resources and storage memory is provided. The storage cluster includes a plurality of blades coupled as the storage cluster, each of at least a subset of the plurality of blades having solid-state storage memory therein. The storage cluster includes a switch that direct network-connects a plurality of processors, as compute resources in the plurality of blades, and the solid-state storage memory in each of the at least a subset of the plurality of blades, wherein the compute resources and the solid-state storage memory are disaggregated in the storage cluster.
Abstract:
An indication is received from a storage device that an attempt to read a portion of data from a block of the storage device has failed. A command is transmitted to the storage device to perform a scan on data stored at the block comprising the portion of data to acquire failure information associated with a plurality of subsets of the data stored at the block. The failure information associated with the plurality of subsets of the data stored at the block is received from the storage device.
Abstract:
An apparatus for artificial intelligence acceleration is provided. The apparatus includes a storage and compute system having a distributed, redundant key value store for metadata. The storage and compute system having distributed compute resources configurable to access, through a plurality of authorities, data in the solid-state memory, run inference with a deep learning model, generate vectors for the data and store the vectors in the key value store.
Abstract:
A method for adjustable error correction in a storage cluster is provided. The method includes determining health of a non-volatile memory of a non-volatile solid-state storage unit of each of a plurality of storage nodes in a storage cluster on a basis of per flash package, per flash die, per flash plane, per flash block, or per flash page. The determining is performed by the storage cluster. The plurality of storage nodes is housed within a chassis that couples the storage nodes as the storage cluster. The method includes adjusting erasure coding across the plurality of storage nodes based on the health of the non-volatile memory and distributing user data throughout the plurality of storage nodes through the erasure coding. The user data is accessible via the erasure coding from a remainder of the plurality of storage nodes if any of the plurality of storage nodes are unreachable.