Automated management of a distributed computing system
Abstract:
A system, method and computer program product are provided for managing a distributed computing system that features multiple hosts executing a distributed application. On each host a collector process collects application-level and/or system-level metrics and reports them to a data repository. A controller executes actor processes that compare the metrics, and/or trends in the metrics, to predetermined thresholds. If a threshold is met or passed, the corresponding actor or the controller initiates one or more remedy processes that take action intended to alleviate the condition detected by the actor. When a remedy is triggered, the controller takes a snapshot of the system to identify the current state, and saves information indicating how well the executed remedies corrected the situation. When a new snapshot matches an existing snapshot, the controller uses the saved information to determine which remedies to apply to the present occurrence of the mutual state.
Public/Granted literature
Information query
Patent Agency Ranking
0/0