Abstract:
A method and apparatus of performing fault tolerance in a fault tolerant computer system comprising: a primary node having a primary node processor; a secondary node having a secondary node processor, each node further comprising a respective memory; a respective checkpoint shim; each of the primary and secondary node further comprising: a respective non-virtual operating system (OS), the non-virtual OS comprising a respective; network driver; storage driver; and checkpoint engine; the method comprising the steps of: acting upon a request from a client by the respective OS of the primary and the secondary node, comparing the result obtained by the OS of the primary node and the secondary node by the network driver of the primary node for similarity, and if the comparison of indicates similarity less than a predetermined amount, the primary node network driver informs the primary node checkpoint engine to begin a checkpoint process.
Abstract:
The disclosure relates to a method of checkpointing. The method may include determining, by the primary computer, when to initiate a checkpoint point operation; dividing, at the primary computer, checkpoint data into two or more groups, wherein each group includes one or more pages of memory; transmitting a first group to the secondary computer; upon receiving, by the secondary computer, the first group, correlating memory pages in the first group with pages in memory on the secondary computer; determining, at the secondary computer, which bytes of memory pages of the first group differ from the correlated pages stored in memory in the secondary computer; and applying data from the first group by swapping differences between the memory pages of the first group and the correlated memory pages stored in the secondary computer. Where at least some of these multiple operations are performed in parallel during a subset of the overall checkpoint operation. The simultaneous performance of various memory manage checkpoint operations is advantageous in various fault tolerant systems. The differences may be N-byte differences such as 8-byte differences.
Abstract:
A checkpointing method in a network device fault tolerant system using virtual machines. In one embodiment, the network device has an input port, an output port, an active virtual machine and a standby virtual machine, a network application on the active virtual machine which manipulates data present on the input port and transmits the manipulated data from the output port; a checkpoint engine on the active virtual machine; and an interface agent, on the active virtual machine, having callable functions to move data from the input port to the output port. The method includes the steps of determining, by the checkpoint engine, that a checkpoint is required; requesting by the checkpoint engine that the interface agent quiescent itself; returning, by the interface agent to the network application, an indicator that no packets are available regardless of whether or not packets are arriving at the input port.
Abstract:
A method of transferring memory from an active to a standby memory in an FT Server system. The method includes the steps of: reserving a portion of memory using BIOS; loading and initializing an FT Kernel Mode Driver; loading and initializing an FT Virtual Machine Manager (FTVMM) including the Second Level Address Translation table SLAT into the reserved memory. In another embodiment, the method includes tracking memory accesses using the FTVMM's SLAT in Reserved Memory and tracking “L2” Guest memory accesses by tracking the current Guest's SLAT and intercepting the Hypervisor's writes to the SLAT. In yet another embodiment, the method includes entering Brownout by collecting the D-Bits; invalidating the processor's cached SLAT translation entries, and copying the dirtied pages from the active memory to memory in the second Subsystem. In one embodiment, the method includes entering Blackout and moving the final dirty pages from active to the mirror memory.
Abstract:
A method of migrating memory from a primary computer to a secondary computer. In one embodiment, the method includes the steps of: (a) waiting for a checkpoint on the primary computer; (b) pausing the primary computer; (c) selecting a group of pages of memory to be transferred to the secondary computer; (d) transferring the selected group of pages of memory and checkpointed data; (e) restarting the primary computer; (f) waiting for a checkpoint on the primary computer; (g) pausing the primary computer; (h) selecting another group of pages of memory to be transferred; (i) transferring the other selected group of pages of memory and data checkpointed since the previous checkpoint to the secondary computer; (j) restarting the primary computer; and (k) repeating steps (f) through (j) until all the memory of the primary computer is transferred.
Abstract:
In part, disclosure relates to a method of regulating checkpointing in an active active fault tolerant system. The method includes receiving a request from a client through a network at a primary computer; copying, by the primary computer, the request from the client to a secondary computer; processing the request from the client, using the primary computer, to generate a primary computer result; processing the copy of the request from the client, using the secondary computer, to generate a secondary computer result; comparing the primary computer result and the secondary computer result to obtain a comparison metric; determining whether a minimum checkpoint interval has been met or exceeded; and if the minimum checkpoint interval has not been met or exceeded, delay initiating a checkpoint process from primary computer to secondary computer.
Abstract:
In part, the disclosure relates to systems and methods to rapidly copy the computer operating system, drivers and applications from a source computer to a target computer using a duplication engine. Once the copy is complete the source computer will resume execution, and the target computer will first alter its configuration (also referred to as a role or personality) and then resume execution conforming to its new configuration as indicated by a profile stored in protected or specialized memory. The profile can be value, a file, or other memory structure and is protected in the sense that the profile (and or the region of memory where it is stored) must not be overwritten by a state transfer from the source computer to the target computer.
Abstract:
In one aspect, the invention relates to a fault tolerant computing system. The system includes a primary virtual machine and a secondary virtual machine, wherein the primary and secondary virtual machines are in communication, wherein the primary virtual machine comprises a first checkpointing engine and a first network interface, wherein the secondary virtual machine comprises a second network interface, wherein the first checkpointing engine forwards a page of memory of the primary virtual machine to the second virtual machine such that the first checkpointing engine can checkpoint the page of memory without pausing the primary virtual machine.
Abstract:
A method and system of checkpointing in a computing system having a primary node and a secondary node is disclosed. In one embodiment the method includes the steps of determining by the primary node to initiate a checkpoint process; sending a notification to the secondary node, by the primary node, of an impending checkpoint process; blocking, by the primary node, I/O requests from the Operating System (OS) that arrive at the primary node after the determination to initiate the checkpoint process; completing, by the primary node, active I/O requests for data received from the OS prior to the determination to initiate the checkpoint process, by accessing the primary node data storage; and upon receiving, by the primary node, a notice of checkpoint readiness from the secondary node, initiating a checkpoint process to move state and data from the primary node to the secondary node.
Abstract:
A method of transferring memory from an active to a standby memory in an FT Server system. The method includes the steps of: reserving a portion of memory using BIOS; loading and initializing an FT Kernel Mode Driver; loading and initializing an FT Virtual Machine Manager (FTVMM) including the Second Level Address Translation table SLAT into the reserved memory. In another embodiment, the method includes tracking memory accesses using the FTVMM's SLAT in Reserved Memory and tracking “L2” Guest memory accesses by tracking the current Guest's SLAT and intercepting the Hypervisor's writes to the SLAT. In yet another embodiment, the method includes entering Brownout by collecting the D-Bits; invalidating the processor's cached SLAT translation entries, and copying the dirtied pages from the active memory to memory in the second Subsystem. In one embodiment, the method includes entering Blackout and moving the final dirty pages from active to the mirror memory.