Abstract:
The present invention minimizes the cost of establishing a cluster that utilizes shared storage by creating a storage namespace within the cluster that makes each storage device, which is physically connected to any of the nodes in the cluster, appear to be physically connected to all nodes in the cluster. A virtual host bus adapter (VHBA) is executed on each node, and is used to create the storage namespace. Each VHBA determines which storage devices are physically connected to the node on which the VHBA executes, as well as each storage device that is physically connected to each of the other nodes. All storage devices determined in this manner are aggregated into the storage namespace which is then presented to the operating system on each node so as to provide the illusion that all storage devices in the storage namespace are physically connected to each node.
Abstract:
A cluster based file service may operate on a cluster of two or more independent devices that have access to a common data storage. The file service may have a namespace definition with each device in the cluster, but may be modified by any device operating the file service. Each instance of the file service may identify and capture a command that changes the namespace structure and cause the change to be propagated to the other members of cluster. If one of the devices in the cluster does not successfully perform an update to the namespace structure, that device may be brought offline. The cluster based file service may permit adding or removing devices from the cluster while the file service is operating, and may provide a high throughput and high availability file service.
Abstract:
Described is a technology by which a storage volume is shared by cluster nodes of a server cluster. In one implementation, each node includes a redirector that provides shared access to the volume from that node. The redirector routes file system metadata requests from applications and the like through a first (e.g., SMB) communications path to the owning node, and routes file system read and write data to the storage device through a second, high-speed communications path such as direct direct block level I/O. An owning node maintains ownership of the storage device through a persistent reservation mechanism that writes a key to a registration table associated with the storage device. Non-owning nodes write a shared key. The owning node validates the shared keys against cluster membership data, and preempts (e.g., removes) any key deemed not valid. Security mechanisms for controlling access are also described.
Abstract:
Embodiments provide a method and system for sharing storage among a plurality of virtual machines. Specifically, one or more embodiments are directed to sharing a virtual hard disk with various virtual machines in a virtual machine cluster. In embodiments, a command is sent from a virtual machine to a local parser. The parser prepares the command for transport over a file system protocol. The command is sent to a remote file server using the file system protocol. When the command is received by the file server, the file server unpacks the command, determines features about the command and converts the command to a format that executes the command on the virtual shared storage.
Abstract:
Embodiments are directed to creating global, aggregated namespaces for storage management and to providing consistent namespaces in a distributed storage system. In one scenario, a computer system defines data storage objects for each data storage node. The data storage objects uniquely identify storage elements of the data storage nodes, where each data storage object includes various associated attributes. The computer system replicates the defined data storage objects and any associated attributes from a first data storage node to a second, different data storage node among the data storage nodes. As such, the defined data storage objects are visible from any node in the data storage nodes. The computer system also aggregates the defined data storage objects for each of the data storage nodes and creates a global, aggregated namespace that includes the aggregated data storage objects for each of the data storage nodes.
Abstract:
The present invention extends to methods, systems, and computer program products for implementing a cache using multiple page replacement algorithms. An exemplary cache can include two logical portions where the first portion implements the least recently used (LRU) algorithm and the second portion implements the least recently used two (LRU2) algorithm to perform page replacement within the respective portion. By implementing multiple algorithms, a more efficient cache can be implemented where the pages most likely to be accessed again are retained in the cache. Multiple page replacement algorithms can be used in any cache including an operating system cache for caching pages accessed via buffered I/O, as well as a cache for caching pages accessed via unbuffered I/O such as accesses to virtual disks made by virtual machines.
Abstract:
The present invention extends to methods, systems, and computer program products for creating a snapshot of a shared volume that is application consistent across various nodes of a cluster. The invention enables a snapshot of a volume to be initiated on one node which causes all applications in the cluster that use the volume to persist their data to the volume prior to the snapshot being created. Accordingly, the snapshot is application consistent to all applications in the cluster that use the volume. The invention also enables applications on various nodes to perform post snapshot processing on the created snapshot. The invention can be used in an existing backup system that is not cluster aware to enable the existing backup system to create application consistent snapshots of a volume shared by applications across multiple nodes of a cluster.
Abstract:
Embodiments provide a method and system for enabling access to a storage device. Specifically, a node may request admittance to a cluster that has read and write access to a storage device. The node seeking access to the storage device must be first be approved by other nodes in the cluster. As part of the request, the node seeking access to the storage device sends a registration key to a storage device. Upon expiration of a registration timer, the node seeking access to the storage device receives a registration table from the storage device and determines whether its registration key is stored in the registration table. If the registration key is stored in the registration table the node has been accepted in the cluster and as a result, has been granted read and write access to the storage device.
Abstract:
The targeted recovery of application-specific data corresponding to an application without performing recovery of the entire volume. The recovery is initiated by beginning to copy the prior state of the content of an application-specific data container from a prior snapshot to the application-specific data container in an operation volume accessible by the application. However, while the content of the application-specific data container is still being copied from the snapshot to the application-specific data container, the application is still permitted to perform read and write operations on the application-specific data container. Thus, the application-specific data container appears to the application to be fully accessible even though recovery of the content of the application-specific data container is still continuing in the background.
Abstract:
The present invention extends to methods, systems, and computer program products for implementing persistent reservation techniques for establishing ownership of one or more physical disks. These persistent reservation techniques can be employed to determine ownership of physical disks in a storage pool as well as in any other storage configuration. Using the persistent reservation techniques of the present invention, when a network partition occurs, a defender of a physical disk does not remove a challenger's registration key until the defender receives notification that the challenger is no longer in the defender's partition. In this way, pending I/O from applications executing on the challenger will not fail due to the challenger's key being removed until the proper ownership of the physical disk can be resolved.