Invention Grant
- Patent Title: Distributed, fault-tolerant and highly available computing system
- Patent Title (中): 分布式,容错和高可用性的计算系统
-
Application No.: US11740556Application Date: 2007-04-26
-
Publication No.: US07937618B2Publication Date: 2011-05-03
- Inventor: Chitra Dorai , Robert E. Strom , Huining Feng
- Applicant: Chitra Dorai , Robert E. Strom , Huining Feng
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agency: Scully, Scott, Murphy & Presser, P.C.
- Agent Stephen C. Kaufman, Esq.
- Main IPC: G06F11/00
- IPC: G06F11/00

Abstract:
A method and system for achieving highly available, fault-tolerant execution of components in a distributed computing system, without requiring the writer of these components to explicitly write code (such as entity beans or database transactions) to make component state persistent. It is achieved by converting the intrinsically non-deterministic behavior of the distributed system to a deterministic behavior, thus enabling state recovery to be achieved by advantageously efficient checkpoint-replay techniques. The method comprises: adapting the execution environment for enabling message communication amongst and between the components; automatically associating a deterministic timestamp in conjunction with a message to be communicated from a sender component to a receiver component during program execution, the timestamp representative of estimated time of arrival of the message at a receiver component. At a component, tracking state of that component during program execution, and periodically checkpointing the state in a local storage device. Upon failure of a component, the component state is restored by recovering a recent stored checkpoint and re-executing the events occurring since the last checkpoint. The system is deterministic by repeating the execution of the receiving component by processing the messages in the same order as their associated timestamp.
Public/Granted literature
- US20080270838A1 DISTRIBUTED, FAULT-TOLERANT AND HIGHLY AVAILABLE COMPUTING SYSTEM Public/Granted day:2008-10-30
Information query