Abstract:
The present invention extends to methods, systems, and computer program products for increasing coordination service reliability. A coordination service maintains state (e.g., using replication) for one or more software components (e.g., applications). Tokens can be used to identify incarnations of a member set within the coordination service. When a member starts and has no token, the member attempts to learn the token from a majority other members. If no such token exists, the member requests a new token. Aspects of the invention can be used to detect and compensate for lost state within the coordination service, including state lost due to storage device failures (which may be referred to as “silent data loss”). Detecting and compensating for silent data loss makes the coordination service more reliable and can essentially guarantee that the coordination service notifies clients when data is lost and ceases processing when incorrect state may exist.
Abstract:
One embodiment of the present invention provides a switch. The switch includes one or more ports, a persistent storage module, a restoration module, and a retrieval module. The persistent storage module stores configuration information associated with the switch in a data structure, which includes one or more columns for attribute values of the configuration information, in a local persistent storage. The restoration module instantiates a restoration database instance in the persistent storage from an image of the persistent storage. The retrieval module retrieves attribute values from a data structure in a current database instance and the restoration database instance in the persistent storage. The restoration module then applies the differences between attribute values of the restoration database instance and the current database instance in the persistent storage to switch modules of the switch, and operates the restoration database instance as the current database instance in the persistent storage.
Abstract:
Methods and systems for providing storage services in a networked environment are provided. A management device interfaces with a plurality of management layers that communicates with a plurality of application plugins executed by a plurality of computing devices. Each application plugin is associated with an application for providing storage services for stored objects managed by a storage system. A same request and response format is used by the management device to obtain information from the plurality of management layers regarding storage space used by the plurality of applications for storing the stored objects and the management device maintains storage space information as a storage resource object for virtual storage resources and physical storage resources used by the plurality of applications for storing the stored objects.
Abstract:
Techniques for detecting data loss during site switchover are disclosed. An example method includes storing at NVRAM of a first node a plurality of operations of a second node, the first and second nodes being disaster recovery partners. The method also includes during a switchover from the second node to the first node, receiving an indication of a first number of operations yet to be completed. The method further includes comparing the first number to a second number of operations in the plurality of operations stored at the NVRAM of the first node. The method also includes in response to the comparing, determining whether at least one operation is missing from the plurality of operations stored in the NVRAM of the first node. The method further includes in response to determining that at least one operation is missing, failing at least one volume.
Abstract:
Embodiments in accordance with the present invention disclose a method, computer program product, and system for optimizing performance of a computer backup solution that includes at least two data movers. The automated method includes measuring data mover performance during operation of a backup cycle, and optimizing the performance of data movers by increasing or decreasing the number of threads operating concurrently in the data movers. The method further includes computation of performance rankings of the data movers and shifting workload among the data movers in accordance with their respective performance rankings, such that the computer backup solution converges toward an optimized configuration.
Abstract:
An apparatus and method detect the use of stale data values due to weak consistency between parallel threads on a computer system. A consistency error detection mechanism uses object code injection to build a consistency error detection table during the operation of an application. When the application is paused, the consistency error detection mechanism uses the consistency error detection table to detect consistency errors where stale data is used by the application. The consistency error detection mechanism alerts the user/programmer to the consistency errors in the application program.
Abstract:
A system includes a multi-process application that runs. A multi-process application runs on primary hosts and is checkpointed by a checkpointer comprised of at least one of a kernel-mode checkpointer module and one or more user-space interceptors providing at least one of barrier synchronization, checkpointing thread, resource flushing, and an application virtualization space. Checkpoints may be written to storage and the application restored from said stored checkpoint at a later time. Checkpointing may be incremental using Page Table Entry (PTE) pages and Virtual Memory Areas (VMA) information. Checkpointing is transparent to the application and requires no modification to the application, operating system, networking stack or libraries. In an alternate embodiment the kernel-mode checkpointer is built into the kernel.
Abstract:
A system includes reception of a command to recover a database to a point in time, determining a log backup which covers the point in time, determination of a sequence identifier associated with the log backup, collection of log backups which are older than the determined log backup and associated with the sequence identifier, and a data backup associated with the sequence identifier, and execution of a recovery of the database based on the determined log backup and the collected log backups and data backup.
Abstract:
A method of detecting aberrant behavior in a software application is described. The method includes instantiating replicated applications on computing devices using identical initial setting. Each replicated application is a replicated instance of the software application. Information associated with a first API call from the first replicated application, and information associated with a second API call from the second replicated application is received. The information includes a call identifier of the API call and a digest. The call identifier is unique during the lifetime of the replicated application issuing it and is identical across the replicated applications. If the first and second call identifiers are identical, the method determines whether the first and second digests match. The method also includes, in response to the first and second digests not matching, signaling that aberrant behavior has occurred. Apparatus and computer readable media are also described.
Abstract:
A system, method, and computer readable medium for asynchronous live migration of applications between two or more servers. The computer readable medium includes computer-executable instructions for execution by a processing system. Primary applications runs on primary hosts and one or more replicated instances of each primary application run on one or more backup hosts. Asynchronous live migration is provided through a combination of process replication, logging, barrier synchronization, checkpointing, reliable messaging and message playback. The live migration is transparent to the application and requires no modification to the application, operating system, networking stack or libraries.