Multiprocessor system with synchronization of error recovery to prevent errors spreading

    公开(公告)号:GB2456403A

    公开(公告)日:2009-07-22

    申请号:GB0822313

    申请日:2008-12-08

    Applicant: IBM

    Abstract: A method of operating self-testing logic in a tree-like multi-chip processor cluster which generates an infrastructure signal 430, such as a clockstop or tracestop signal, used for error management and recovery. The operation intercepts 440 the infrastructure signal of a processor of the cluster then extracts error information from the infrastructure signal. Using the error information a pre-defined inter-chip error synchronisation scheme is selected 450 including clock-stop and/or trace-stop information for a respective one of the processors of the cluster. Notification signals are distributed 490 to chips of the cluster using dedicated wires or a low-level standard interface for chip-to-chip communication to prepare and execute error related internal operations for chips. On receipt of one of the notification signals a chip performs at least one of (i) performing a trace-stop command or (ii) performing a clock-stop command 495 on a respective one of the chips as derived from the synchronisation scheme. The synchronisation scheme may comprise a configurable delay adjustable according to the location of the failure within the chip.

    A METHOD FOR PROVIDING REDUNDANCY FOR CHANNEL ADAPTER FAILURE

    公开(公告)号:AU2003226784A1

    公开(公告)日:2003-10-27

    申请号:AU2003226784

    申请日:2003-04-04

    Applicant: IBM

    Abstract: The invention relates to a method for providing improved reliability of any node attaching to an InfiniBand fabric, the method comprising the steps of: a) providing a first and a second physical Channel Adapter having a first and a second number of ports, b) providing program means for registering the first and second physical Channel Adapters as one logical Channel Adapter having a number of first and second ports, c) providing first and second caching means for storing first and second control information for the first and second Channel Adapter, d) providing system memory means for storing first and second control information, and e) providing means for copying the first control information from the system memory to the second caching means in case of a failure of the first Channel Adapter and for initiating an Automatic Path Migration from the first number of ports to the second number of ports.

    7.
    发明专利
    未知

    公开(公告)号:AT468562T

    公开(公告)日:2010-06-15

    申请号:AT01128821

    申请日:2001-12-04

    Applicant: IBM

    Abstract: A method and apparatus relates to hardware-to-hardware data transmission in computer systems, and in particular, it relates to method and system for operating I/O adapters attaching either one or more computing devices to an I/O periphery, to a network, or to other computing devices. It is proposed to operate a memory local to the network coupling adapter as a cache memory relative to a system memory associated with the one or more computing devices for storing transmission control information.

    Block based end-to-end data protection of extended count key data (ECKD)

    公开(公告)号:GB2494037A

    公开(公告)日:2013-02-27

    申请号:GB201214520

    申请日:2012-08-15

    Applicant: IBM

    Abstract: Information units (UI's) are aligned in a block boundary format. Block protection trailer data is added to each one of the UI's. In a data storage and transmission system information units of variable-length data records such as extended count key data (ECKD) may be stored and transferred according to a fixed-length block protocol such as the T10/SCSI standard. Protection data such as cyclic redundancy check (CRC) hash is added to the end of each block. Padding data may be added to the variable length data when forming the fixed-length blocks. The fixed-length blocks may subsequently be re-assembled into ECKD if required.

Patent Agency Ranking