Abstract:
PROBLEM TO BE SOLVED: To provide a computer implemented method, apparatus and mechanism for recovery of an I/O fabric that becomes terminally congested or deadlocked due to a failure which causes buffers/queues to fill and thereby causes the root complexes to lose access to their I/O subsystems. SOLUTION: Upon detection of a terminally congested or deadlocked transmit queue, access to such queue by other root complexes is suspended while each item in the queue is examined and processed accordingly. Storage requests and DMA read reply packets in the queue are discarded, and load requests in the queue are processed by returning a special completion package. Access to the queue by the root complexes is then resumed. COPYRIGHT: (C)2008,JPO&INPIT
Abstract:
A method of operating self-testing logic in a tree-like multi-chip processor cluster which generates an infrastructure signal 430, such as a clockstop or tracestop signal, used for error management and recovery. The operation intercepts 440 the infrastructure signal of a processor of the cluster then extracts error information from the infrastructure signal. Using the error information a pre-defined inter-chip error synchronisation scheme is selected 450 including clock-stop and/or trace-stop information for a respective one of the processors of the cluster. Notification signals are distributed 490 to chips of the cluster using dedicated wires or a low-level standard interface for chip-to-chip communication to prepare and execute error related internal operations for chips. On receipt of one of the notification signals a chip performs at least one of (i) performing a trace-stop command or (ii) performing a clock-stop command 495 on a respective one of the chips as derived from the synchronisation scheme. The synchronisation scheme may comprise a configurable delay adjustable according to the location of the failure within the chip.
Abstract:
The invention relates to a method for providing improved reliability of any node attaching to an InfiniBand fabric, the method comprising the steps of: a) providing a first and a second physical Channel Adapter having a first and a second number of ports, b) providing program means for registering the first and second physical Channel Adapters as one logical Channel Adapter having a number of first and second ports, c) providing first and second caching means for storing first and second control information for the first and second Channel Adapter, d) providing system memory means for storing first and second control information, and e) providing means for copying the first control information from the system memory to the second caching means in case of a failure of the first Channel Adapter and for initiating an Automatic Path Migration from the first number of ports to the second number of ports.
Abstract:
The method involves storing data relating to the operation path in a memory (15), the operation path containing a description of a sequence of operations. A unique operation ID is assigned to each operation. The ID remains constant during the processing of the operation by a number of functional units of the computer system to be monitored. An operation is assigned to an associated operation graph (14) containing status control data for the functional units, and the contents of the memory are evaluated to obtain tracking data. A computer system is also claimed.
Abstract:
A method and apparatus relates to hardware-to-hardware data transmission in computer systems, and in particular, it relates to method and system for operating I/O adapters attaching either one or more computing devices to an I/O periphery, to a network, or to other computing devices. It is proposed to operate a memory local to the network coupling adapter as a cache memory relative to a system memory associated with the one or more computing devices for storing transmission control information.
Abstract:
Information units (UI's) are aligned in a block boundary format. Block protection trailer data is added to each one of the UI's. In a data storage and transmission system information units of variable-length data records such as extended count key data (ECKD) may be stored and transferred according to a fixed-length block protocol such as the T10/SCSI standard. Protection data such as cyclic redundancy check (CRC) hash is added to the end of each block. Padding data may be added to the variable length data when forming the fixed-length blocks. The fixed-length blocks may subsequently be re-assembled into ECKD if required.