Abstract:
A high performance computing system and method communicate data packets between computing nodes on a multi-lane communications link using a modified header bit encoding. Each data packet is provided with flow control information and error detection information, then divided into per-lane payloads. Sync header bits for each payload are added to the payloads in non-adjacent locations, thereby decreasing the probability that a single correlated burst error will invert both header bits. The encoded blocks that include the payload and the interspersed header bits are then simultaneously transmitted on the multiple lanes for reception, error detection, and reassembly by a receiving computing node.
Abstract:
A high performance computing system and method communicate data packets between computing nodes on a multi-lane communications link using a modified header bit encoding. Each data packet is provided with flow control information and error detection information, then divided into per-lane payloads. Sync header bits for each payload are added to the payloads in non-adjacent locations, thereby decreasing the probability that a single correlated burst error will invert both header bits. The encoded blocks that include the payload and the interspersed header bits are then simultaneously transmitted on the multiple lanes for reception, error detection, and reassembly by a receiving computing node.
Abstract:
A system and method provide a communications link having a plurality of lanes, and an in-band, real-time physical layer protocol that keeps all lanes on-line, while failing lanes are removed, for continuous service during fail over operations. Lane status is monitored real-time at the physical layer receiver, where link error rate, per lane error performance, and other channel metrics are known. If a lane failure is established, a single round trip request/acknowledge protocol exchange with the remote port completes the fail over. If a failing lane meets an acceptable performance level, it remains on-line during the round trip exchange, resulting in uninterrupted link service. Lanes may be brought in or out of service to meet reliability, availability, and power consumption goals.