Invention Grant
US08713352B2 Method, system and program for securing redundancy in parallel computing system
失效
用于在并行计算系统中确保冗余的方法,系统和程序
- Patent Title: Method, system and program for securing redundancy in parallel computing system
- Patent Title (中): 用于在并行计算系统中确保冗余的方法,系统和程序
-
Application No.: US11608331Application Date: 2006-12-08
-
Publication No.: US08713352B2Publication Date: 2014-04-29
- Inventor: Masakuni Okada , Fumitomo Ohsawa , Yoshiko Ishii , Naoki Matsuo
- Applicant: Masakuni Okada , Fumitomo Ohsawa , Yoshiko Ishii , Naoki Matsuo
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agent Grant A. Johnson
- Priority: JP2005-369863 20051222
- Main IPC: G06F11/00
- IPC: G06F11/00

Abstract:
In a parallel computing system having a plurality of computing node groups including at least one spare computing node group, a plurality of managing nodes for allocating jobs to the computing node groups and an information management server having respective computing node group status information are associated with the computing node groups, and the respective managing nodes update respective in-use computing node group status information by accessing the information management server. Furthermore, when the managing node detects an occurrence of a failure, the managing node having used then the computing node group disabled due to the failure identifies a spare computing node group by accessing the computing node group status information in the information management server. Then, the managing node having used then the disabled computing node group obtains the computing node group information of the identified spare computing node group. Furthermore, since the managing node having used then the disabled computing node group can continue processing by switching the disabled computing node group to the identified spare computing node group as a computing node group to be used, on the basis of the computing node group information of the identified spare computing node group, the redundancy in the parallel computing system can be secured.
Public/Granted literature
- US20070180288A1 METHOD, SYSTEM AND PROGRAM FOR SECURING REDUNDANCY IN PARALLEL COMPUTING SYTEM Public/Granted day:2007-08-02
Information query