Abstract:
Embodiments include a system, an apparatus, a device, and a method. An apparatus includes a synchronous circuit including a first subcircuit powered by a first power plane having a first power plane voltage and a second subcircuit powered by a second power plane having a second power plane voltage. The apparatus also includes an error detector operable to detect an incidence of a computational error occurring in the first subcircuit. The apparatus includes a controller operable to change the first power plane voltage based upon the detected incidence of a computational error. The apparatus also includes a power supply configured to electrically couple with a portable power source and operable to provide a selected one of at least two voltages to the first power plane in response to the controller.
Abstract:
Embodiments include a system, a device, and a method. A computing system includes a synchronous circuit. The synchronous circuit includes a first subcircuit powered by a first power plane having a first power plane voltage and a second subcircuit powered by a second power plane having a second power plane voltage. The system also includes an error detector operable to detect an incidence of a computational error occurring in the first subcircuit. The system further includes a controller operable to change the first power plane voltage based upon the detected incidence of a computational error. The system may include a power supply operable to provide a selected one of at least two voltages to the first power plane in response to the controller.
Abstract:
A symmetric multiprocessing fault-tolerant computer system controls memory access in a symmetric multiprocessing computer system. To do so, virtual page structures are created, where the virtual page structures reflect physical page access privileges to shared memory for processors in a symmetric multiprocessing computer system. Access to shared memory is controlled based on physical page access privileges reflected in the virtual paging structures to coordinate deterministic shared memory access between processors in the symmetric multiprocessing computer system. A symmetric multiprocessing fault-tolerant computer system may use duplication or continuous replay.
Abstract:
One embodiment of the present invention provides a system that corrects bit errors in temporary results within a central processing unit (CPU). During operation, the system receives a temporary result during execution of an in-flight instruction. Next, the system generates a parity bit for the temporary result, and stores the temporary result and the parity bit in a temporary register within the CPU. Before the temporary result is committed to the architectural state of the CPU, the system checks the temporary result and the parity bit to detect a bit error. If a bit error is detected, the system performs a micro-trap operation to re-execute the instruction that generated the temporary result, thereby regenerating the temporary result. Otherwise, if a bit error is not detected, the system commits the temporary result to the architectural state of the CPU.
Abstract:
The invention concerns a circuit (3) for detecting errors in data supplied by at least one of the blocks of the assembly. When an error has been detected, the assembly is decontaminated by means of at least a circuit for backup and reconstitution of past states of a latch associated with a block. The backup and reconstitution circuit comprises a multiplexer (6) and a FIFO buffer register (5). The multiplexer includes a first input directly connected to the output of the latch (2a) and a second input connected to said output via the buffer register. A control circuit (4) controls the buffer register and the multiplexer so as to activate the buffer register writing function and to connect the multiplexer output to its first input at each cycle, during a normal operating phase, and to activate the buffer register reading function and connect the multiplexer output to its second input during predetermined cycles of a decontamination phase.
Abstract:
In order to gather, store temporarily and efficiently deliver safestore information in a CPU (10) having data manipulation (40) circuitry including a register bank, first and second serially oriented safestore buffers are employed. At suitable times during the processing of information, a copy of the instantaneous contents of the register bank is transferred into the first safestore buffer. After a brief delay, a copy of the first safestore buffer is transferred into the second safestore buffer. If a call for a domain change (42) (which might include a process change or a fault) is sensed, a safestore frame is sent to cache (3), and the first safestore buffer is loaded from the second safestore buffer rather than from the register bank. Later, during a climb operation, if a restart of the interrupted process is undertaken and the restoration of the register bank is directed to be taken from the first safestore buffer, this source, rather than the safestore frame stored in cache (3), is employed to obtain a corresponding increase in the rate of restart. In one embodiment, the transfer of information between the register bank and the safestore buffers is carried out on a bit-by-bit basis to achieve additional flexibility of operation and also to conserve integrated circuit space.
Abstract:
This fault-tolerant computer system employs multiple (for example, three) identical CPUs executing the same instruction stream, with multiple (for example, two) memory modules which in their address space of the CPUs store duplicates of the same data. The CPUs are loosely synchronized, as by detecting events such as memory references and stalling any CPU ahead of the others until all execute the respective function simultaneously. Interrupts are synchronized by ensuring that all CPUs implement the interrupt at the same point in their instruction stream. Memory references via the separate CPU-to-memory busses are voted at the three separate ports of each of the memory modules. I/O functions are implemented using two identical I/O busses, each of which is separately coupled to only one of the memory modules. A number of I/O processors are coupled to both I/O busses. I/O devices are accessed through a pair of redundant identical I/O processors, but only one is designated to actively control a given I/O device. In case of failure of one I/O processor, however, an I/O device can be accessed by the other one without system shutdown, i.e., by merely redesignating the addresses of the registers of the I/O device under instruction control.
Abstract:
In a multiprocessor system, an error occurring in any one of the CPUs may have an impact upon the operation of the remaining CPUs, and therefore these errors must be handled quickly. The errors are grouped into two categories: synchronous errors (those that must be corrected immediately to allow continued processing of the current instruction); and asynchronous errors (those errors that do not affect execution of the current instruction and may be handled upon completing execution of the current instruction). Since synchronous errors prevent continued execution of the current instruction, it is preferable that the last stable state conditions of the faulting CPU be restored and the faulting instruction reexecuted. These stable state conditions advantageously occur between the execution of each instruction. However, in a pipelined computer system, it is difficult to identify the beginning and ending of a selected instruction since multiple instructions are in process at the same time. Accordingly, the execution unit is selected to be the point of synchronization between error handling and instruction execution. Once the error is identified as asynchronous or synchronous and the execution unit allows the instruction to complete or rolls back the state conditions to their preinstruction values, error analyzing software examines the condition of the suspect data latches in the CPU. A serial diagnostic link stops the system clock of the CPU and serially loads the CPU data latches into the System Processor Unit for error determination. Thereafter, the CPU system clock is restarted and the CPU resumes execution.
Abstract:
Method and apparatus for controlling initiating of bootstrap loading in a computer system having first and second discrete computing zones is disclosed. Each computing zone includes a status register for storing an operating system run (OSR) bit indicating that the zone has initiated bootstrap loading. A cable connects the computing zones to allow the first and second zones to read the status registers in the second and first zones, respectively. A CPU in each zone only enables initiation of bootstrap loading if the OSR bit in the other zone is not set.
Abstract:
In a multiprocessor system, an error occurring in any one of the CPUs may have an impact upon the operation of the remaining CPUs, and therefore these errors must be handled quickly. The errors are grouped into two categories: synchronous errors (those that must be corrected immediately to allow continued processing of the current instruction); and asynchronous errors (those errors that do not affect execution of the current instruction and may be handled upon completing execution of the current instruction). Since synchronous errors prevent continued execution of the current instruction, it is preferable that the last stable state conditions of the faulting CPU be restored and the faulting instruction reexecuted. These stable state conditions advantageously occur between the execution of each instruction. However, in a pipelined computer system, it is difficult to identify the beginning and ending of a selected instruction since multiple instructions are in process at the same time. Accordingly, the execution unit is selected to be the point of synchronization between error handling and instruction execution. Once the error is identified as asynchronous or synchronous and the execution unit allows the instruction to complete or rolls back the state conditions to their preinstruction values, error analyzing software examines the condition of the suspect data latches in the CPU. A serial diagnostic link stops the system clock of the CPU and serially loads the CPU data latches into the System Processor Unit for error determination. Thereafter, the CPU system clock is restarted and the CPU resumes execution.