Clustered fault tolerance systems and methods using load-based failover
Abstract:
A computer implemented method for providing fault tolerance to a plurality of machines includes determining an aggregate load for each surviving machine of a plurality of surviving machines; calculating a recovery load of one or more orphaned jobs resulting from a terminating event; and selecting to recover and perform one or more of the orphaned jobs, by one or more of the surviving machines, based upon (i) the recovery load of the one or more orphaned jobs; (ii) the job load of the one or more orphaned jobs; and (iii) the aggregate loads of the surviving machines.
Information query
Patent Agency Ranking
0/0