Abstract:
Autonomous recovery from a transient hardware failure by executing portions of a stream of program instructions as a transaction. A start of a transaction is created in a stream of program instructions executing on a first processor of a multi-processor computer. A snapshot of a system state information is saved when the transaction begins. When the transaction ends, store data of the transaction is committed. If a transient hardware failure occurs, the transaction is aborted without notifying the computer software application that initiated the stream of program instructions. The transaction is re-executed on a second processor of the multi-processors, based on the saved snapshot of the system state information.
Abstract:
Autonomous recovery from a transient hardware failure by executing portions of a stream of program instructions as a transaction. A start of a transaction is created in a stream of executing program instructions. A snapshot of a system state information is saved when the transaction begins. When a predefined number of program instructions in the stream are executed, the transaction ends, and store data of the transaction is committed. A new transaction then begins. If a transient hardware failure occurs, the transaction is aborted without notifying the computer software application that initiated the stream of program instructions. The transaction is re-executed, based on the saved snapshot of the system state information.
Abstract:
A system that implements a data storage service may store data on behalf of storage service clients. The system may maintain data in multiple replicas of various partitions that are stored on respective computing nodes in the system. The system may employ a single master failover protocol, usable when a replica attempts to become the master replica for a replica group of which it is a member. Attempting to become the master replica may include acquiring a lock associated with the replica group, and gathering state information from the other replicas in the group. The state information may indicate whether another replica supports the attempt (in which case it is included in a failover quorum) or stores more recent data or metadata than the replica attempting to become the master (in which case synchronization may be required). If the failover quorum includes enough replicas, the replica may become the master.
Abstract:
Embodiments may provide a method for performing a replay of a previous execution of a program. The method includes generating an order of recorded chunks of instructions across a plurality of recorded threads based, at least in part, on log files generated from the previous execution of the program. The method includes initiating execution of the program, the executing program having a plurality of threads, each thread having a number of chunks of instructions. The method includes intercepting, by a virtual machine unit executing on a processor, an instruction of a chunk before the instruction is executed. The method includes determining, by a replay module executing on the processor, that the chunk is an active chunk if the chunk is currently in line for execution according to the order of recorded chunks, and responsive to a determination that the chunk is the active chunk, executing the instruction.
Abstract:
Techniques disclosed herein relate to synchronizing a first database with a second database. Embodiments include detecting a write operation modifying properties of a data object in the first database. While the data object is locked, embodiments write object change data to a journal table. Embodiments query the journal table of the recovery database to retrieve a portion of the object change data corresponding to a first window of time and comprising a plurality of entries. The retrieved portion of object change data is processed to create processed object data by collapsing duplicate entries within the plurality of entries. Embodiments retrieve object data from the first database, corresponding to properties of data objects specified in the processed object change data. The retrieved object data is pushed to the second database, whereby the second database is synchronized with the first database.
Abstract:
A system, method, and computer readable medium for reliable messaging between two or more servers. The computer readable medium includes computer-executable instructions for execution by a processing system. Primary applications runs on primary hosts and one or more replicated instances of each primary application run on one or more backup hosts. The reliable messaging ensures consistent ordered delivery of messages in the event that messages are lost; arrive out of order, or in duplicate. The messaging layer operates over TCP or UDP with our without multi-cast and broad-cast and requires no modification to applications, operating system or libraries.
Abstract:
A transactional memory system salvages a hardware lock elision (HLE) transaction. A processor of the transactional memory system executes a lock-acquire instruction in an HLE environment and records information about a lock elided to begin HLE transactional execution of a code region. The processor detects a pending point of failure in the code region during the HLE transactional execution. The processor stops HLE transactional execution at the point of failure in the code region. The processor acquires the lock using the information, and based on acquiring the lock, commits the speculative state of the stopped HLE transactional execution. The processor starts non-transactional execution at the point of failure in the code region.
Abstract:
A system that implements a data storage service may store data on behalf of storage service clients. The system may maintain data in multiple replicas of various partitions that are stored on respective computing nodes in the system. The system may employ a single master failover protocol, usable when a replica attempts to become the master replica for a replica group of which it is a member. Attempting to become the master replica may include acquiring a lock associated with the replica group, and gathering state information from the other replicas in the group. The state information may indicate whether another replica supports the attempt (in which case it is included in a failover quorum) or stores more recent data or metadata than the replica attempting to become the master (in which case synchronization may be required). If the failover quorum includes enough replicas, the replica may become the master.
Abstract:
Methods are provided. A method for swapping-out an offload process from a coprocessor includes issuing a snapify_pause request from a host processor to the coprocessor to initiate a pausing of the offload process executing by the coprocessor and another process executing by the host processor using a plurality of locks. The offload process is previously offloaded from the host processor to the coprocessor. The method further includes issuing a snapify_capture request from the host processor to the coprocessor to initiate a local snapshot capture and saving of the local snapshot capture by the coprocessor. The method also includes issuing a snapify_wait request from the host processor to the coprocessor to wait for the local snapshot capture and the saving of the local snapshot capture to complete by the coprocessor.
Abstract:
Providing a computer program product, system, and method for setting copy permissions for target data in a copy relationship. Source data is copied from a first storage to a first data copy in a second storage. A request is received to copy requested data from the first data copy to a second data copy. The second copy operation is performed to copy the requested first data copy form the second storage to a second data copy in response to determining that the requested first data copy is not in the state that does not permit the copying. The request is denied in response to determining that the requested first data copy is in the state that does not permit copying.