Abstract:
A method and apparatus for detecting and tolerating situations in which one or more processors in a multi-processor system cannot participate in timer-driven or timer-triggered protocols or event sequences. The multi-processor system includes multiple processors each having a respective memory. These processors are coupled by an inter-processor communication network (preferably consisting of redundant paths). Processors are suspected of having failed (ceased operations) outright or having a failed timer mechanism when other processors detect the absence of periodic "IamAlive" messages from other processors. When this happens, all of the processors in the system are subjected to a series of stages in which they repeatedly broadcast their status and their connectivity to each other. During the first such stage, according to the present invention, a processor will not assert its ability to participate unless its timer mechanism is working. It arms a timer expiration event and does not assert its health until and unless that timer expiration event occurs.
Abstract:
A method and apparatus for detecting and tolerating situations in which one or more processors (112a, b, ..., n) in a multi-processor system cannot participate in timer-driven or timer-triggered protocols or event sequences. The multi-processor system includes multiple processors each having a respective memory (118a, b, ..., n). These processors are coupled by an interprocessor communication network (114) (preferably consisting of redundant paths). Processors are suspected of having failed (ceased operations) outright or having a failed timer mechanism when other processors detect the absence of periodic "IamAlive" messages from other processors. When this happens, all of the processors in the system are subjected to a series of stages in which they repeatedly broadcast their status and their connectivity to each other. During the first such stage, according to the present invention, a processor will not assert its ability to participate unless its timer mechanism is working. It arms a timer expiration event and does not assert its health until and unless that timer expiration event occurs.
Abstract:
An apparatus and method, using an inter-processor lock to control access to inter-process relationship data structures in the memory (3a, 3b, ..., 3n) of each processor (2a, 2b, ..., 2n) in a multiprocessor system (1). The apparatus and method insure that each inter-process relationship is modified in the same sequence on each processor (2a, 2b, ..., 2n). The apparatus and method also insure that an inter-process relationship is maintained in a consistent state in the face of failure of any of the processors (2a, 2b, ..., 2n).
Abstract:
An apparatus and method, using an inter-processor lock to coordinate signal delivery to a process group whose member processes are distributed across multiple processors. The apparatus and method insure that each process group member process receives the same signals in the same order and that no signal is duplicated. The apparatus and method also insure that a partially completed signal delivery is completed even in the face of failure of the signalling processor.
Abstract:
A system to determine the group of processors that will survive communications faults and/or timed-event failures in a multi-processor system (100). The processors (112), each having a memory (118) and connected to an interprocessor communication network (114), detect that the set of processors with which they can communicate has changed. They then choose to halt or continue operations based on minimizing the likelihood that disconnected groups of processors will continue to operate as independent systems on the initiation of a regroup operation (622b). A processor is suspected of having failed when other processors detect the absence of a periodic message from the processor (682). When this happens, all of the processors are subjected to a series of stages in which they repeatedly broadcast their status and connectivity to each other (830). The suspected processor does not advance through the stages to regroup if it has ceased operations or if its timer mechanism has failed.
Abstract:
A data processing system for transferring data is provided. This system includes central processing units (CPUs 20, 22, 24 and 26) and storage units (30 and 32 with 100-105 and 110-115) which are interconnected by a network (10). The CPUs (20, 22, 24 and 26) include a request process (133) and a storage process (130). The storage process (130) controls access to the storage unit (30 with 100-105 and 110-115). Software routines (220) are used to provide direct access to the storage unit (30 with 100-105 and 110-115) by the request CPU (22). The request CPU (20) is the CPU containing the request process (133). A virtual memory address for a buffer (160) of the request CPU (22) is created in the request CPU (22). The virtual memory address along with a storage unit access request are sent to the CPU (20) containing the storage process (130). A work request including the virtual memory address to sent from the storage process (130) to the storage unit (30 with 100-105 and 110-115). The data is then transferred directly between the request CPU (22) and the storage unit (30 with 100-105 and 110-115). The storage unit (30 with 100-105 and 110-115) then responds to the work request.
Abstract:
An apparatus and method, using an inter-processor lock to control access to inter-process relationship data structures in the memory (3a, 3b, ..., 3n) of each processor (2a, 2b, ..., 2n) in a multiprocessor system (1). The apparatus and method insure that each inter-process relationship is modified in the same sequence on each processor (2a, 2b, ..., 2n). The apparatus and method also insure that an inter-process relationship is maintained in a consistent state in the face of failure of any of the processors (2a, 2b, ..., 2n).
Abstract:
An apparatus and method, using an inter-processor lock to coordinate signal delivery to a process group (G1) whose member processes (P100, P110, P120) are distributed across multiple processors (2a, 2b, 2c). The apparatus and method insure that each process group memory process receives the same signals in the same order and that no signal is duplicated. The apparatus and method also insure that a partially completed signal delivery is completed even in the face of failure of the signalling processor.