Invention Grant
- Patent Title: Failure recovery resolution in transplanting high performance data intensive algorithms from cluster to cloud
-
Application No.: US14555285Application Date: 2014-11-26
-
Publication No.: US09626261B2Publication Date: 2017-04-18
- Inventor: Da Qi Ren , Zhulin Wei
- Applicant: Futurewei Technologies, Inc.
- Applicant Address: US TX Plano
- Assignee: FUTUREWEI TECHNOLOGIES, INC.
- Current Assignee: FUTUREWEI TECHNOLOGIES, INC.
- Current Assignee Address: US TX Plano
- Agency: Futurewei Technologies, Inc.
- Main IPC: G06F11/20
- IPC: G06F11/20 ; G06F11/14 ; H04L29/08 ; G06F11/07 ; H04L12/26

Abstract:
A method of providing failure recovery capabilities to a cloud environment for scientific HPC applications. An HPC application with MPI implementation extends the class of MPI programs to embed the HPC application with various degrees of fault tolerance. An MPI fault tolerance mechanism realizes a recover-and-continue solution. If an error occurs, only failed processes re-spawn, the remaining living processes remain in their original processors/nodes, and system recovery costs are thus minimized.
Public/Granted literature
Information query