Abstract:
A system for maintaining the reliability of shared data structures, such as message queues, in a multi-processor data processing system is disclosed. The system includes a plurality of virtual memory- type processor units (10) in which the processors share virtual memory and can access each one of a plurality of virtual memory segments by the same virtual memory address. The system assures the reliability of system-wide shared data structures in the event of a failure of one of the processors by maintaining at least two copies of each data structure and by maintaining two copies of a table used in locating such data structures. The system updates copies of such shared data structures that are stored in at least two different processing units with the results of a specified data processing transaction, which may have affected the information in such data structures, in a manner to insure that either identical updates occur or no update occurs. The system further insures that any changes that might have occurred in the information stored in the data structure prior to the end of an aborted transaction is returned to the initial state to permit the transaction to be retried.
Abstract:
Apparatus and method for reading data pages 33 in a transaction processing system 20 without locking the pages are disclosed. The system maintains a Global_Committed_LSN 36 identifying the oldest uncommitted transaction accessing any of the data, and Object_Committed_LSNs 38a,38b identifying the oldest uncommitted transactions accessing particular files, tables and indexes. Each data page includes a Page_LSN 35 identifying the last transaction to have updated the page. To read a page, a transaction first latches the pages, and compares the page s Page_LSN with the Global_Committed_LSN, or with the page's respective Object_Committed_LSN. If the Page_LSN is older than the Committed_LSN with which is was compared, then the transaction reads the page without locking it, since there can be no uncommitted transaction in process which might have updated the page's data. However if the Page_LSN is younger than the Committed_LSN, the page is locked before being read.
Abstract:
In one embodiment, a system for managing communication connections in a virtualization environment includes a plurality of host machines implementing a virtualization environment, wherein each of the host machines includes a hypervisor, at least one user virtual machine (user VM), and a distributed file server that includes file server virtual machines (FSVMs) and associated local storage devices. Each FSVM and associated local storage device are local to a corresponding one of the host machines, and the FSVMs conduct I/O transactions with their associated local storage devices based on I/O requests received from the user VMs. Each of the user VMs on each host machine sends each of its representative I/O requests to an FSVM that is selected by one or more of the FSVMs for each I/O request based on a lookup table that maps a storage item referenced by the I/O request to I/O the selected one of the FSVMs.
Abstract:
Techniques are described herein for quick identification of a set of units of data for which recovery operations are to be performed to redo or undo changes made by the failed node. When a lock is requested by an instance, lock information for the lock request is replicated by another instance. If the instance fails, the other instance may use the replicated lock information to determine a set of data blocks for recovery operations. The set of data blocks is available in memory of a recovery instance when a given node fails, and does not have to be completely generated by scanning a redo log.
Abstract:
Autonomous recovery from a transient hardware failure by executing portions of a stream of program instructions as a transaction. A start of a transaction is created in a stream of executing program instructions. A snapshot of a system state information is saved when the transaction begins. When a predefined number of program instructions in the stream are executed, the transaction ends, and store data of the transaction is committed. A new transaction then begins. If a transient hardware failure occurs, the transaction is aborted without notifying the computer software application that initiated the stream of program instructions. The transaction is re-executed, based on the saved snapshot of the system state information.
Abstract:
Systems and methods for failure recovery in shared storage operations. An example method comprises: acquiring a lock with respect to a storage domain comprising a specified disk image; creating a transaction marker associated with the disk image; creating a component of a new volume associated with the disk image; destroying the transaction marker; and releasing the lock with respect to the storage domain.
Abstract:
Implementations are disclosed for a centralized peripheral access controller (PAC) that is configured to protect one or more peripheral components in a system. In some implementations, the PAC stores data that can be set or cleared by software. The data corresponds to an output signal of the PAC that is routed to a corresponding peripheral component. When the data indicates that the peripheral is “unlocked” the PAC will allow write transfers to registers in the peripheral component. When the data indicates that the peripheral component is “locked” the PAC will refuse write transfers to registers in the peripheral component and terminate with an error.
Abstract:
A DSM component is organized as a matrix of page. The data structure of a set of data structures occupies a column in the matrix of pages. A recovery file is maintained in a persistent storage. The recovery file consists of entries and each one of the entries corresponds to a column in the matrix of pages by a location of each one of the entries. The set of data structures is stored in the DSM component and in the persistent storage. Incorporated into each one of the plurality of entries in the recovery file is an indication if an associated column in the matrix of pages is assigned with the data structure of the set of data structures; and additionally incorporated into each one of the plurality of entries in the recovery file are identifying key properties of the data structure of the set of data structures.
Abstract:
A system is described for identifying key lock contention issues in computing devices. A computing device is executed and lock contention information relating to operations during execution of the computing device is recorded. The data is parsed and analyzed to determine blocking relationships between operations due to lock contention. Algorithms are implemented to analyze dependencies between operations based on the data and to identify key areas of optimization for performance improvement. Algorithms can be based on the Hyperlink-Induced Topic Search algorithm or the PageRank algorithm.
Abstract:
According to one aspect, provided are methods and systems for minimizing lock contention in a distributed database environment. The methods and systems can include a database management component configured to manage database instances, the database management component also configured to receive a first data request operation on the distributed database, an execution component configured to process the first data request operation including at least one write request on at least one database instance managed by the database management component, and a fault prediction component configured to detect a potential page fault responsive to a target data of the write request, wherein the execution component is further configured to suspend execution of the first data request operation, request access a physical storage to read the target data into active memory, and re-execute the first data request operation after a period of time for suspending the first data request operation.