Abstract:
A method for performing process fault tolerant control of an electronic device, and an associated apparatus and an associated computer program product are provided, where the method includes: using at least one driver in a kernel layer of an operating system (OS) of the electronic device to perform detection to determine whether a specific process running on the electronic device will be influenced by an error of the electronic device; and when it is detected that the specific process running on the electronic device will be influenced by the error of the electronic device, using at least one control signal of the OS to perform process control on the specific process and using a package manager service (PMS) module of the OS to trigger a rescue procedure. For example, the method may further include: when triggering the rescue procedure, preventing immediately triggering termination of the specific process.
Abstract:
A system and method for high speed data recording includes a control computer and a disk pack unit. The disk pack is provided within a shell that provides handling and protection for the disk packs. The disk pack unit provides cooling of the disks and connection for power and disk signaling. A standard connection is provided between the control computer and the disk pack unit. The disk pack units are self sufficient and able to connect to any computer. Multiple disk packs are connected simultaneously to the system, so that one disk pack can be active while one or more disk packs are inactive. To control for power surges, the power to each disk pack is controlled programmatically for the group of disks in a disk pack.
Abstract:
A method for performing error recovery that includes creating, by a processor, a recovery checkpoint. The processor is dynamically switched into a non-recoverable processing mode of operation based on creating the software recovery checkpoint. The non-recoverable processing mode of operation is a mode in which a subset of hardware error recovery resources are powered-down or re-purposed for instruction processing. It is determined, during the non-recoverable processing mode of operation, that a new software recovery checkpoint is required. Based on the determining that a new software recovery checkpoint is required, the processor is dynamically switched into a recoverable processing mode of operation. The recoverable processing mode of operation is a mode in which hardware error recovery resources, including at least one of the hardware error recovery resources in the subset, are purposed for hardware error recovery operations.
Abstract:
A computer program product for performing error recovery is configured to perform a method that includes creating, by a processor, a recovery checkpoint. The processor is dynamically switched into a non-recoverable processing mode of operation based on creating the software recovery checkpoint. The non-recoverable processing mode of operation is a mode in which a subset of hardware error recovery resources are powered-down or re-purposed for instruction processing. It is determined, during the non-recoverable processing mode of operation, that a new software recovery checkpoint is required. Based on the determining that a new software recovery checkpoint is required, the processor is dynamically switched into a recoverable processing mode of operation. The recoverable processing mode of operation is a mode in which hardware error recovery resources, including at least one of the hardware error recovery resources in the subset, are purposed for hardware error recovery operations.
Abstract:
Provided are systems and methods for accessing a storage device from a node when a local connection failure occurs between the node and the storage device. A failure is determined to have occurred at a first node access path between a first node and a storage device that prevents an application at the first node from accessing the storage device from the first node access path. An access request is sent from the first node to a second node. The second node has a second node access path to the storage device. A determination is made that the second node can communicate with the storage device. The storage device is accessed by an application at the first node via the second node access path.
Abstract:
Embodiments of the disclosure are directed to an apparatus that comprises a first core susceptible to an error condition, and a second core configured to perform a diagnostic on the first core to identify a cause of the error condition and an action to remedy the error condition in order to recover the first core.
Abstract:
A memory storage device and a repairing method thereof are provided. The memory storage device has a rewritable non-volatile memory module having multiple physical units. The physical units include at least one backup physical unit which is configured to be accessed only by a specific command set and stored with at least one customized data. The method includes receiving a specific read command from a host system for reading the backup physical unit and transmitting the customized data therein to the host system when the memory storage device is capable of receiving and processing commands from the host system, the specific read command belongs to the specific command set; and writing the customized data from the host system into a corresponding physical unit to restore the memory storage device to a factory setting when receiving the writing command from the host system for writing the customized data.
Abstract:
A computer program product for performing error recovery is configured to perform a method that includes creating, by a processor, a recovery checkpoint. The processor is dynamically switched into a non-recoverable processing mode of operation based on creating the software recovery checkpoint. The non-recoverable processing mode of operation is a mode in which a subset of hardware error recovery resources are powered-down or re-purposed for instruction processing. It is determined, during the non-recoverable processing mode of operation, that a new software recovery checkpoint is required. Based on the determining that a new software recovery checkpoint is required, the processor is dynamically switched into a recoverable processing mode of operation. The recoverable processing mode of operation is a mode in which hardware error recovery resources, including at least one of the hardware error recovery resources in the subset, are purposed for hardware error recovery operations.
Abstract:
An apparatus for handling anomalies in a hardware system including a master device and at least one slave device coupled with the master device through an interconnect device is provided. The apparatus includes at least one controller operative to receive status information relating to the slave device. The status information is indicative of whether an anomaly is present in the slave device and/or the interconnect device. The controller is operative to generate output response information as a function of the status information relating to the slave device for detecting and/or responding to hardware system anomalies in a manner which reduces a need for resetting the hardware system to return to normal operation.
Abstract:
In accordance with embodiments of the present disclosure, a method may comprise identifying one or more portions of the memory having defects. The method may also include storing one or more addresses in the memory defect list, each of the one or more addresses associated with a portion of the one or more identified portions. The method may further include indicating to components of an information handling system that the one or more identified portions are unusable such that the other components are prevented from allocating and using the one or more identified portions.