Abstract:
Exemplary embodiments provide a way to manage data recovery in a distributed system having multiple data store nodes. A storage system comprises: a first node including a first processor; and a plurality of second nodes coupled to the first node, each of the plurality of second nodes including a second processor and one or more second storage devices. The first processor is configured to control to store data and replication of the data in the second storage devices of two or more second nodes. If at least one of the second nodes has failed and a storage capacity of the plurality of second nodes is below a given threshold, one of the second nodes is configured to receive a first data, which is replication of data stored in a failed second node, from another of the second nodes, and create parity data based on the received first data.
Abstract:
A system and method provide a communications link having a plurality of lanes, and an in-band, real-time physical layer protocol that keeps all lanes on-line, while failing lanes are removed, for continuous service during fail over operations. Lane status is monitored real-time at the physical layer receiver, where link error rate, per lane error performance, and other channel metrics are known. If a lane failure is established, a single round trip request / acknowledge protocol exchange with the remote port completes the fail over. If a failing lane meets an acceptable performance level, it remains on-line during the round trip exchange, resulting in uninterrupted link service. Lanes may be brought in or out of service to meet reliability, availability, and power consumption goals.
Abstract:
Die Erfindung betrifft ein Verfahren zur Behandlung von Fehlern in einem zentralen Steuergerät, wobei das Steuergerät ein verteiltes Computersystem (100) umfasst, an welches verteilte Computersystem (100) Sensoren (112, 113, 122, 123) angeschlossen bzw. anschließbar sind, wobei das verteilte Computersystem (100), insbesondere alle Komponenten des Computersystems, auf eine erste Fault-Containment-Unit FCU1 (101) und eine zweite Fault- Containment-Unit FCU2 (102) aufgeteilt ist, wobei die FCU1 (101) und die FCU2 (102) jeweils über eine eigene, unabhängige Stromversorgung versorgt werden, und wobei die FCU1 (101) und die FCU2 (102) ausschließlich über galvanisch getrennte Leitungen Daten austauschen, und wobei ein Teil der Sensoren zumindest mit der FCU1 (101) verbunden ist und der andere Teil der Sensoren zumindest mit der FCU2 (102) verbunden ist, und wobei die FCU1 (101) und die FCU2 (102) mit einem redundant ausgelegten Kommunikationssystem (131, 132) mit einem oder mehreren Aktuatoren verbunden sind, sodass bei Ausfall der FCU1 die FCU2 eine eingeschränkte Funktionalität unter Verwendung der der FCU2 zugeordneten Sensoren aufrecht erhält, und bei Ausfall der FCU2 die FCU1 eine eingeschränkte Funktionalität unter Verwendung der der FCU1 zugeordneten Sensoren aufrecht erhält.
Abstract:
Jobs submitted to a primary location of a service within a period of time before and/or after a fail-over event are determined and are resubmitted to a secondary location of the service. For example, jobs that are submitted fifteen minutes before the fail-over event and jobs that are submitted to the primary network before the fail-over to the second location is completed are resubmitted at the secondary location. After the fail-over event occurs, the jobs are updated with the secondary network that is taking the place of the primary location of the service. A mapping of job input parameters (e.g., identifiers and/or secrets) from the primary location to the secondary location are used by the jobs when they are resubmitted to the secondary location. Each job determines what changes are to be made to the job request based on the job being resubmitted.
Abstract:
A computer system includes a plurality of interdependent processors. Each interdependent processor executes an independent operating system image without sharing file system state information, and each interdependent processor further has a network access card with a first network connection and a second network connection. The computer system has a first active backplane (146) coupled to each first network connection of each processor; a second active backplane (148) coupled to each second network connection of each processor, the second active backplane (148) operating in lieu of the first active backplane (146) in case of a fail-over; and one or more peripherals connected to each of the first and second active backplanes and responsive to data requests transmitted over the first and second active backplanes.