Abstract:
When a media error occurs on a storage device of a number of storage devices of a redundant array, the logical stripe of data affected by the media error is determined. A portion of non-volatile memory is reserved and the logical stripe is backed up to this portion of non-volatile memory. A read request is subsequently serviced from the non-volatile memory and not from the storage devices. When a write request is received, it is first serviced to the storage devices. If successful, then the previously reserved portion of non-volatile memory is freed up, and subsequent requests are serviced using the storage devices. If unsuccessful, then the write request is serviced using the non-volatile memory.
Abstract:
The present disclosure includes detecting a failure associated with a first storage location on which a first agent virtual computing instance (AVCI) is deployed, wherein the first AVCI is being executed by a first hypervisor, stopping the execution of the first AVCI, determining whether a second AVCI that provides services analogous to the first AVCI is being executed by a second hypervisor and is deployed on a second storage location, creating a linked clone of the second AVCI on the second storage location responsive to the second AVCI being executed by the second hypervisor and deployed on the second storage location, redeploying the first AVCI on the second storage location responsive to the second AVCI not being executed by the second hypervisor or not deployed on the second storage location, and deleting files of the first AVCI from the first storage location after the failure is corrected.
Abstract:
Provided a computer program product, system, and method for rebuilding damaged areas of a volume table using a volume data set for managing data sets assigned data units in a volume in a storage. A determination is made of damaged areas in a volume table providing information on data sets allocated in the volume. The determined damaged areas are formatted to produce reformatted areas to make the volume table usable. A volume data set in the volume having information on data sets configured in the volume is processed to determine from the volume data set salvaged data sets comprising the data sets in the volume not indicated in the volume table. Data set information is rebuilt in the reformatted areas of the volume table for the salvaged data sets.
Abstract:
Embodiments include a method for temporary pipeline marking for processor error workarounds. The method includes monitoring a pipeline of a processor for an event that is predetermined to place the processor in a stuck state that results in an errant instruction execution result due to the stuck state or repeated resource contention causing performance degradation. The pipeline is marked for a workaround action based on detecting the event. A clearing action is triggered based on the marking of the pipeline. The marking of the pipeline is cleared based on the triggering of the clearing action.
Abstract:
Embodiments of the present invention disclose a method for recovery of a two-phase commit transaction. A computer receives an end command prior to completing execution of a prepare command for a transaction identifier. The computer determines if a failure and restart occurred within a distributed data processing environment after a resource manager receives an end command. The computer responds to a determination that the failure and restart did occur within the distributed data processing environment by retrieving the first transaction identifier from a data store. The computer transmits a rollback command for the retrieved first transaction identifier to the resource manager.
Abstract:
Technologies for providing manageability redundancy for micro server and clustered System-on-a-Chip (SoC) deployments are presented. A configurable multi-processor apparatus may include multiple integrated circuit (IC) blocks where each IC block includes a task block to perform one or more assignable task functions and a management block to perform management functions with respect to the corresponding IC block. Each task block and each management block may include one or more instruction processors and corresponding memory. Each IC block may be controllable to perform a function of one or more other IC blocks. The IC blocks may communicate with each other via a management communication infrastructure that may include a communication path from each of the management blocks to each of the other management blocks. Via the management communication infrastructure, the management blocks may bridge communication paths between pairs of management blocks.
Abstract:
Embodiments relate to method and computer program products which prioritize the logical units in a subgroup. Thereafter, in case of abnormal operation of the process for copying the consistency group from primary storage to secondary storage, low priority logical units of the subgroups of the consistency group are not copied from primary storage to secondary storage.
Abstract:
A Flash-based memory system comprises a plurality of Flash memory devices, a Flash controller communicating independently with each Flash memory device to perform memory operations, a power circuit providing power the Flash memory devices, and a CPU configured to perform a controlled powering down procedure upon detecting a power failure. In some embodiments, the Flash-based memory system includes a backup power source having a charge storage device and charging circuitry, the CPU configured to perform one or more test procedures on the charge storage device to provide an indication of a charge storage capacity of the charge storage device. A plurality of Flash-based memory systems may be mounted on a Flash-based memory card, and multiple such Flash-based memory cards may be combined into a Flash-based memory module. A number of Flash-based memory modules may then be removably mounted in a rack-mountable housing to form unitary Flash-based memory unit.
Abstract:
Apparatuses, systems and methods are disclosed for tolerating fault in a communications grid. Specifically, various techniques and systems are provided for detecting a fault or failure by a node in a network of computer nodes in a communications grid, adjusting the grid to avoid grid failure, and taking action based on the failure. In an example, a system may include receiving grid status information at a backup control node, the grid status information including a project status, storing the grid status information within the backup control node, receiving a failure communication including an indication that a primary control node has failed, designating the backup control node as a new primary control node, receiving updated grid status information based on the indication that the primary control node has failed, and transmitting a set of instructions based on the updated grid status information.
Abstract:
For disaster recovery involving a first site and a disaster recovery site, where at least a portion of management service metadata not isolated within the management service, a failover process is initiated, including creating an initial snapshot of the distributed metadata state. In a failback process, a representation is created of state changes for the management service and a delta description is calculated therefrom. The delta description is transmitted to the first site; and a reverse replica is created, at the first site, of all the workload components from the disaster recovery site. The delta description is played back to restore a distributed metadata state that existed in the disaster recovery site and to re-create it in the first site.