Abstract:
An object based file system for storing and accessing objects is disclosed. The file system may be implemented as a method in hardware, firmware, software, or a combination thereof. The method may include receiving from an application program an object write request. A selected storage node on which to store the object may be selected, including identifying a least busy storage node and/or a least full storage node. The object and the object write request may be sent to the selected storage node. A write success message may be received from the selected storage node. The successful writing of the object may be reported to the application program.
Abstract:
A resilient distributed replicated data storage system is described herein. The storage system includes zones that are independent, and autonomous from each other. The zones include nodes that are independent and autonomous. The nodes include storage devices. When a data item is stored, it is partitioned into a plurality of data objects and a plurality of parity objects are calculated. Reassembly instructions are created for the data item. The data objects, parity objects and reassembly instructions are spread across nodes and zones in the storage system according to a policy for the data item. When a zone is inaccessible, a virtual zone is created and used until the intended zone is available. When a read request is received, the data item is prepared from the lowest latency nodes according to the reassembly instructions, and a virtual zone is accessed in place of a real zone when the real zone is inaccessible.
Abstract:
The system and routine for data caching leverages the properties of Network-Attached Non-Volatile Memories (NANVMs) to provide virtualized secure node-local storage services to the network users with reduced data movement across the NANVMs. The caching routine reserves storage resources (storage partitions) on NANVM devices, migrates data required for the target application execution to the allocated storage partitions, and directs the network clients to dynamically “mount” to the storage partitions based on application data requirements. Only those clients and applications that present valid credentials and satisfactory computing capabilities can access the data in the specific storage partitions. Several clients can have an access to the same storage partitions without duplication or replicating the data. A Global Data Indexing sub-system supports the efficient operation of the subject system. The Global Data Indexing Sub-System provides mapping between the storage partitions, data sets, applications, client nodes, as well as their credentials/capabilities.
Abstract:
Asynchronous namespace maintenance in a distributed replicated data storage system is disclosed. An access device/program serving as a front end to the distributed replicated data storage system updates a batch of updated meta data about stored data items when data items are stored in the distributed replicated data storage system. When the elapsed time since the last batch of data item meta data was stored exceeds a first threshold value or the current batch size exceeds a second threshold value, the access device/program stores the current batch of updated meta data as an object in the distributed replicated data storage system, receiving a batch object identifier for the stored batch of updated meta data, and distributes the batch object identifier to other access devices and/or access programs which retrieve the batch of updated meta data and update their namespaces.
Abstract:
In the data storage system the storage area network performs XOR operations on incoming data for parity generation without buffering data through a centralized RAID engine or processor. The hardware for calculating the XOR data is distributed to incrementally calculate data parity in parallel across each data channel and may be implemented as a set of FPGAs with low bandwidths to efficiently scale as the amount of storage memory increases. A host adaptively appoints data storage controllers in the storage area network to perform XOR parity operations on data passing therethrough. The system provides data migration and parity generation in a simple and effective matter and attains a reduction in cost and power consumption.
Abstract:
The present invention is directed to data migration, and particularly, Parity Group migration, between high performance data generating entities and data storage structure in which distributed NVM arrays are used as a single intermediate logical storage which requires a global registry/addressing capability that facilitates the storage and retrieval of the locality information (metadata) for any given fragment of unstructured data and where Parity Group Identifier and Parity Group Information (PGI) descriptors for the Parity Groups' members tracking, are created and distributed in the intermediate distributed NVM arrays as a part of the non-deterministic data addressing system to ensure coherency and fault tolerance for the data and the metadata. The PGI descriptors act as collection points for state describing the residency and replay status of members of the Parity Groups.
Abstract:
An object based file system for storing and accessing objects is disclosed. The file system may be implemented as a method in hardware, firmware, software, or a combination thereof. The method may include receiving from an application program an object write request. A selected storage node on which to store the object may be selected, including identifying a least busy storage node and/or a least full storage node. The object and the object write request may be sent to the selected storage node. A write success message may be received from the selected storage node. The successful writing of the object may be reported to the application program.
Abstract:
A data storage system allowing for ingest of data when certain storage is unavailable is described herein. The storage system includes zones that are independent and autonomous from each other. The zones include nodes that are independent and autonomous. The nodes include storage devices. When data is to be stored in the data storage system according to a specified storage policy and the specified storage policy cannot be achieved, the data is stored according to a fallback storage policy. This allows a client to be able to continue executing without having to wait for a storage anomaly to be corrected or pass. After the data is stored according to a fallback storage policy, the data is at a later time stored according to the specified storage policy.
Abstract:
Data storage systems and methods for storing data are described herein. The storage system includes a first storage node is configured to issue a first delivery request to a first set of other storage nodes in the storage system, the first delivery request including a first at least one data operation for each of the first set of other storage nodes and issuing at least one other delivery request, while the first delivery request remains outstanding, the at least one other delivery request including a first commit request for each of the first set of other storage nodes. The first node causes the first at least one data operation to be made active within the storage system in response to receipt of a commit indicator along with a delivery acknowledgement regarding one of the at least one other delivery request.
Abstract:
Data storage systems and methods for storing data are described herein. The storage system includes at least two data storage nodes for storing portions of a distributed hash table and related data. After a first node attempts to complete a write request at a second node and is unable to complete the request, the first node ceases responses to interactions from other nodes. Once the first node's failure to respond has caused a sufficient number of nodes to cease responding, the nodes enter a service mode to resolve the live lock. While in live lock, the nodes determine the oldest, unfulfilled request using a system-wide logical timestamp associated with write requests. Once the oldest request is determined, a removal vote to remove the non-responsive node from the group is initiated and, if other nodes agree, the non-responsive node is removed from the group of nodes.