Abstract:
Für eine verfügbare Hardwareressourcen besonders gut ausnut-zende Datenübertragung wird erfindungsgemäß bei einem Verfah-ren zur Datenübertragung in einem redundant ausgeführten Au-tomatisierungssystem (2), umfassend eine Anzahl von Daten-übertragungseinheiten (6) und eine Anzahl von Datenverarbei-tungseinheiten (4), ein die Datenübertragung charakterisie-render Datenfluss auf den betreffenden Datenübertragungsein-heiten (6) fortlaufend überwacht und in Abhängigkeit vom Da-tenfluss zwischen verschiedenen Betriebsarten derart umge-schaltet wird, dass bei Vorliegen eines synchronisierten Da-tenflusses auf mindestens zwei Datenübertragungseinheiten (6) eine Betriebsart eingestellt wird, die gegenüber einer Be-triebsart bei einem fehlerhaften oder einseitigen Datenfluss auf einem der Datenübertragungseinheiten (6) eine höhere Ver-fügbarkeit aufweist.
Abstract:
Multiple Array Management Functions (80) included in controller (30) are connected to multiple redundancy groups (40) over a storage area network (SAN), such as a fiber-channel based SAN (50). The multiple Array Management Functions share management responsibility of the redundancy groups, each of which typically includes multiple resources spread over multiple disks (45). The AMFs provide concurrent access to the redundancy groups for associated host systems. When a host requests an AMF to perform an operation on a resource, the AMF synchronizes with the other AMFs sharing control of the redundancy group that includes the resource to be operated on, so as to obtain a lock on the resource. While performing the operation, the AMF send replication data and state information associated with the resource such that if the AMF fails, any of the other AMFs are able to complete the operation and maintain data reliability and coherency.
Abstract:
A computer system includes a plurality of interdependent processors. Each interdependent processor executes an independent operating system image without sharing file system state information, and each interdependent processor further has a network access card with a first network connection and a second network connection. The computer system has a first active backplane (146) coupled to each first network connection of each processor; a second active backplane (148) coupled to each second network connection of each processor, the second active backplane (148) operating in lieu of the first active backplane (146) in case of a fail-over; and one or more peripherals connected to each of the first and second active backplanes and responsive to data requests transmitted over the first and second active backplanes.
Abstract:
A method of accessing data stored in a storage disk of a storage system includes the steps of receiving a read operation to a sector of the storage disk and in response to an error returned from the read operation, determining whether the sector is to be replaced. If the sector is determined to be replaced, the method further includes replacing the sector with a spare sector. The data previously stored at the replaced disk sector are reconstructed and written to the spare sector, and the LBA assigned to the replaced sector is reassigned to the PBA associated with the spare sector.
Abstract:
A computer implemented method for providing fault tolerance to a plurality of instances in a system including a plurality of surviving instances includes: determining, for each of the surviving instances, an aggregate load by: retrieving a job load of each job assigned to the respective surviving instance; and summing the job loads of all of the jobs assigned to the respective surviving instance; and selecting to recover and perform, by one of the surviving instances, an orphaned job based upon the aggregate loads of the surviving instances.
Abstract:
A system, method, and machine-readable storage medium for recovering data in a distributed storage system are provided. In some embodiments, the method includes identifying a failing storage device of a first storage node having an inaccessible data segment. When it is determined that the inaccessible data segment cannot be recovered using a first data protection scheme, a first chunk of data associated with the inaccessible data segment is identified and a group associated with the first chunk of data is identified. A second chunk of data associated with the group is selectively retrieved from a second storage node such that data associated with an accessible data segment of the first storage node is not retrieved. The inaccessible data segment is recovered by recovering the first chunk of data using a second data protection scheme and the second chunk of data.
Abstract:
A method, non-transitory computer readable medium, and apparatus that monitors an active virtual storage controller. A determination of when a failure of the active virtual storage controller has occurred is made based on the monitoring. When the failure of the active virtual storage controller is determined to have occurred, storage devices previously assigned to the active virtual storage controller are remapped to a passive virtual storage controller and transactions in a transaction log are replayed. In another example, active storage controllers are monitored with a passive storage controller. When a failure of one of the active storage controllers has occurred based on the monitoring is determined, storage devices previously assigned to the active storage controller are remapped, a transaction log associated with the active storage controller is retrieved from a transaction log database, and transactions in the transaction log are replayed.
Abstract:
Apparatuses, systems and methods are disclosed for tolerating fault in a communications grid. Specifically, various techniques and systems are provided for detecting a fault or failure by a node in a network of computer nodes in a communications grid, adjusting the grid to avoid grid failure, and taking action based on the failure. In an example, a system may include receiving grid status information at a backup control node, the grid status information including a project status, storing the grid status information within the backup control node, receiving a failure communication including an indication that a primary control node has failed, designating the backup control node as a new primary control node, receiving updated grid status information based on the indication that the primary control node has failed, and transmitting a set of instructions based on the updated grid status information.
Abstract:
In a computing device supporting a failover in an event stream processing (ESP) system, an event block object is received. A first status of the computing device as active or standby is determined. When the first status is active, a second status of the computing device as newly active or not newly active is determined. Newly active is determined when the computing device is switched from a standby to an active status. When the second status is newly active, a last published event block object identifier that uniquely identifies a last published event block object is determined. A next event block object is selected from a non-transitory computer-readable medium accessible by the computing device. The next event block object has an event block object identifier that is greater than the determined last published event block object identifier. The selected next event block object is published to an out-messaging network device.
Abstract:
An example data restoration approach includes loading a replacement storage media upon detecting a media failure in a failed storage media, detecting a request for data originally stored on the failed storage media that is pending restoration to the replacement storage media, and in response to detecting this data request, restoring a data segment associated with the data from a backup to the replacement storage media. The approach further modifies the data segment in the replacement storage media according to archived modifications to the data segment in a log archive and then responds to the data request.