Abstract:
A system and method of fault tolerant for distributed applications in a virtualized environment is provided by utilizing Application Agent (AA) of the application Peer-to-Peer (P2P) overlay network. The system and method of the present invention includes the steps of pre-deployment of Virtual Machine (VM) images by executing application by User by invoking the Application Agent (502) and the Application Agent (AA) contact the nearest front end node (504). The Application Agent (AA) request for deployment of virtual machines (VMs) based on task requirement upon receipt of response from front end node (506). Thereafter, a structured overlay network is formed based on virtual machines (VMs) allocated by the front end node (508) and Application Agent (AA) further tracks the status of virtual machines (VMs). Upon successful deployment of virtual machine (VM) images, tasks are spawn during execution of application (404) by replicating computational tasks and data items in DHT (Distributed Hash Table)-based peer to peer (P2P) overlay network with small overhead. Computational tasks are allocated to virtual machines (VMs) (406) and completed task are registered accordingly (408) upon successful allocation of said tasks. Further, the Application Agent (AA) retrieves output data of each completed task. Pre-deployment of virtual machine (VM) images enables Application Agent (AA) to initiate deployment of virtual machines (VMs) based on task requirement and tracking the deployment of VM status. Further, Distributed Hash Tables (DHTs) are leverage to provide long-term fault tolerance which enables remote computational steering without advance reservation.
Abstract:
There is disclosed a cloud computing managing system, whereby the system comprising at least one computing resource (10) being managed by a Log Manager (11) adapted and configured to collect, parse, analyse and visualize information based on a failure within the cloud computing system and generating an output. The system further comprises at least one log collector (12) interconnected at the computing resource clusters (100) and is configured to collect and gather information related to the failure, storing the information in a database (40) and forward the information to the Log Manager (11). The system further provides an analytical business intelligence module (13) with a dashboard (14) for use as an interface and to display the output from Log Manager (11). A method thereof is also provided.