FAULT TOLERANT COMPUTER MEMORY SYSTEMS AND COMPONENTS EMPLOYING DUAL LEVEL ERROR CORRECTION AND DETECTION WITH LOCK-UP FEATURE

    公开(公告)号:CA2002362A1

    公开(公告)日:1990-09-10

    申请号:CA2002362

    申请日:1989-11-07

    Applicant: IBM

    Abstract: In a memory system comprising a plurality of memory units (10) each of which possesses unit-level error correction capabilities (20) and each of which are tied to a system level error correction function (30), memory reliability is enhanced by providing means for fixing the output of one of the memory units at a fixed value in response to the occurrence of an uncorrectable error in one of the memory units. This counter-intuitive approach to the generation of forced hard errors nonetheless enhances overall memory system reliability since it enables the employment of the complement/recomplement algorithm which depends upon the presence of reproducible errors for proper operation. Thus, chip level error correction systems, which are increasingly desirable at high packaging densities, are employed in a way which does not interfere with system level error correction methods.

    MEMORY SYSTEM RESTRUCTURED BY DETERMINISTIC PERMUTATION ALGORITHM

    公开(公告)号:DE3380573D1

    公开(公告)日:1989-10-19

    申请号:DE3380573

    申请日:1983-03-10

    Applicant: IBM

    Abstract: Swapping of physical bits between different logical words of a memory (40) is accomplished by reference to data (46) on bad bits in the memory. Different permutation data (34) are selected to control address inputs to each bit position (12i) in a memory word so that any word with multiple uncorrectable data is changed to a correctable logical data word by placing one or more of the bad bits in the original word into another word of the memory. The swapping is done by an exclusionary process (48) which identifies and deselects certain deleterious potential combinations of actual addresses thereby limiting the selection process to ohter combinations. The process can involve categorizing (44) of failures in accordance with type, and performing (48) algorithm operations which identify combinations of bit addresses that would result in combining the failures so that there are more errors in any memory word than would be correctable by the error correction code monitoring (42) the memory.

    MODULAR DISTRIBUTED ERROR DETECTION AND CORRECTION APPARATUS AND METHOD

    公开(公告)号:CA1014665A

    公开(公告)日:1977-07-26

    申请号:CA198452

    申请日:1974-04-24

    Applicant: IBM

    Abstract: Errors in code words fetched from memory or utilized in some other device are detected by apparatus distributed throughout the memory and then corrected. Illustratively, a 72-bit parallel code word, comprising a 64-bit information portion and an 8-bit check portion is fetched from the memory. The check bit generator consists of 8 identical modular units which, in the case of use in a memory, can be located at different locations within the memory. The identical modular units are connected in accordance with connections determined by an H matrix. The H matrix is partitioned into eight equal sections associated with eight information bits forming a byte and a single check bit. The rows of each partition or section are cyclically permutated from section to section. For example, the first row of the first section becomes the second row of the second section, etc. Each partition of the H matrix contains the same number of 1's and each row within a partition is part of a different code group. Each of the identical modular arrangements contains a logic circuit grouping. The input information byte bits are connected to the circuits of the logic grouping so as to produce as circuit outputs the parities of the part of the code groups in the partition or section associated with the module. The identical modular units also contain circuitry to receive the partial code groups parities from the other modular units concerned with the same code group. These partial code group parities and the partial code group parity of the respective module are combined to provide the check bit for the particular module. The partial code group parity outputs from the module are transmitted to the successive other modules to form the partial code group parity inputs for the respective modules. After the information has been utilized such as writing in storage, the information bits and check bits are read into an error detector which compares the check bits generated from the received information bits with the received check bits. An error locator analyzes any mismatch to determine the location of an error. An error corrector then corrects any information or check bit which is identified as incorrect by the error locator. The error detector can consist of the same identical modular units as the check bit generator.

    APPARATUS FOR CORRECTING TWO GROUPS OF MULTIPLE ERRORS

    公开(公告)号:CA951434A

    公开(公告)日:1974-07-16

    申请号:CA129691

    申请日:1971-12-09

    Applicant: IBM

    Inventor: BOSSEN DOUGLAS C

    Abstract: Apparatus including a decoder adapted for recovering the data from a received message corresponding to the sent message but which may be in error wherein the blocks of data consist of k bytes of data (D0, D1, D2,...Dk-1) each of b bits. The sent message comprises the k bytes of data plus two check bytes C1 and C2, each of b bits. The decoder is effective in recovering the data without error when not more than two of the bytes are in error no matter how many bits may be in error in the two bytes. Pointers are required which indicate the two bytes containing errors. In the absence of the pointers or in the presence of a single false pointer, the decoder is effective in recovering the data without error when not more than a single byte is in error no matter how many bits may be in error in the single byte. The message is encoded by computing the check bytes according to the relationship:

Patent Agency Ranking