Abstract:
For disaster recovery involving a first site and a disaster recovery site, where at least a portion of management service metadata not isolated within the management service, a failover process is initiated, including creating an initial snapshot of the distributed metadata state. In a failback process, a representation is created of state changes for the management service and a delta description is calculated therefrom. The delta description is transmitted to the first site; and a reverse replica is created, at the first site, of all the workload components from the disaster recovery site. The delta description is played back to restore a distributed metadata state that existed in the disaster recovery site and to re-create it in the first site.
Abstract:
A technique is provided for accumulating failures. A failure of a first row is detected in a group of array macros, the first row having first row address values. A mask has mask bits corresponding to each of the first row address values. The mask bits are initially in active status. A failure of a second row, having second row address values, is detected. When none of the first row address values matches the second row address values, and when mask bits are all in the active status, the array macros are determined to be bad. When at least one of the first row address values matches the second row address values, mask bits that correspond to at least one of the first row address values that match are kept in active status, and mask bits that correspond to non-matching first address values are set to inactive status.
Abstract:
In an approach for taking corrupt portions of cache offline during runtime, a notification of a section of a cache to be taken offline is received, wherein the section includes one or more sets in one or more indexes of the cache. An indication is associated with each set of the one or more sets in a first index of the one or more indexes, wherein the indication marks the respective set as unusable for future operations. Data is purged from the one or more sets in the first index of the cache. Each set of the one or more sets in the first index is marked as invalid.
Abstract:
A processor includes a plurality of processing sections, each of which executes a predetermined process. A plurality of fault detecting circuits are respectively provided for the plurality of processing sections, to detect a fault in one of the plurality of processing sections as a fault processing section to generate a fault detection signal. A fault monitoring and control section controls a normal processing section as at least one of the plurality of processing sections other than the fault processing section to execute a relieving process in response to the fault detection signal. The relieving process is determined based on a process load of the fault processing section, a process load of the normal processing section, and priority levels of processes to be executed by the fault processing section and the normal processing section.
Abstract:
In an approach for taking corrupt portions of cache offline during runtime, a notification of a section of a cache to be taken offline is received, wherein the section includes one or more sets in one or more indexes of the cache. An indication is associated with each set of the one or more sets in a first index of the one or more indexes, wherein the indication marks the respective set as unusable for future operations. Data is purged from the one or more sets in the first index of the cache. Each set of the one or more sets in the first index is marked as invalid.
Abstract:
An improved scalable object storage system includes methods and systems allowing multiple clusters to work together. Users working with a first cluster, or with a multi-cluster gateway, can ask for services and have the request or data transparently proxied to a second cluster. This gives transparent cross-cluster replication, as well as multi-cluster compute or storage farms based upon spot availability or various provisioning policies. Vendors providing a cloud storage “frontend” can provide multiple backends simultaneously. In one embodiment, a multi-cluster gateway can have a two, three, or higher-level ring that transparently matches an incoming request with the correct cluster. In the ring, a request is first mapped to an abstract “partition” based on a consistent hash function, and then one or more constrained mappings map the partition number to an actual resource. In another embodiment, the multi-cluster gateway is a dumb gateway, and the rings are located only at the cluster level.
Abstract:
Apparatus and methods, such as those that read data from non-volatile integrated circuit memory devices, such as NAND flash. For example, disclosed techniques can be embodied in a device driver of an operating system. Errors are tracked during read operations. If sufficient errors are observed during read operations, the block is then retired when it is requested to be erased or a page of the block is to be written. One embodiment is a technique to recover data from uncorrectable errors. For example, a read mode can be changed to a more reliable read mode to attempt to recover data. One embodiment further returns data from the memory device regardless of whether the data was correctable by decoding of error correction code data or not.
Abstract:
An access control method and system and an access point. When a fault occurs in an access controller (AC), an access point (AP) configures a network-layer interface of the AP according to an Internet Protocol (IP) address and a media access control (MAC) address of the AC that are obtained by means of pre-learning, and then the AP routes a received packet to a Web server on a wireless local area network (WLAN) using the configured network-layer interface, where the packet is used by a first station (STA) to request to access an external server. Therefore, interconnection and interworking among wireless local area networks are implemented, and a breakdown of a wireless local area network caused in a centralized network architecture due to occurrence of a fault in an AC is avoided.
Abstract:
Embodiments of the present invention disclose a method, computer program product, and system for memory replication. In one embodiment, in accordance with the present invention, the computer implemented method includes the steps of executing a mobile agent on a server node, wherein the server node is within a cluster of server nodes connected via network communications, capturing a memory state of the server node during operation of the server node, which is captured and stored by the mobile agent, monitoring the server node to determine whether the server node has failed, and responsive to determining that the server node has failed, migrating the mobile agent to an active server node within the cluster of server nodes, wherein the mobile agent carries the captured memory state.
Abstract:
A recovering method is adapted to an encoding operation performed on a storage area of a storage device. The recovering method includes: reading a variable set, wherein the encoding operation comprises a plurality of sub-operations, and each of the sub-operations is corresponding to at least one flag variable in the variable set; determining whether any one of the sub-operations is interrupted according to the variable set; when one of the sub-operations is interrupted, recovering the sub-operation according to the at least one flag variable corresponding to the sub-operation; and carrying on the encoding operation according to a process recorded by the flag variables in the variable set.