Invention Grant
US08458517B1 System and method for checkpointing state in a distributed system 有权
分布式系统中检查点状态的系统和方法

System and method for checkpointing state in a distributed system
Abstract:
A system and method is disclosed for recording checkpoints in a distributed system. A distributed system comprises one or more computers implementing a plurality of nodes coordinating with one another to maintain a shared state of the distributed system. The system chooses a given one of the plurality of nodes to record a checkpoint of the shared state. In response, the given node records the checkpoint by isolating itself from communication with the other nodes, storing the checkpoint, restarting, and attempting to reinitialize its state from the stored checkpoint. Restarting may include deliberately causing a runtime error in the node. If the reinitialization is successful, the node restores communication with the other nodes and indicates to them that the newly stored checkpoint is valid.
Information query
Patent Agency Ranking
0/0