Abstract:
A shared-nothing database system is provided in which the rows of each table are assigned to "slices", and multiple copies ("duplicas") of each slice are stored across the persistent storage of multiple nodes. Requests to read data from a particular row of the table may be handled by any node that stores a duplica of the slice to which the row is assigned. For each slice, a single duplica of the slice is designated as the "primary duplica". All DML operations are performed by the node that has the primary duplica of the slice to which the target row is assigned. The changes are then propagated other duplicas ("secondary duplicas") of the same slice.
Abstract:
A shared-nothing database system is provided in which the rows of each table are assigned to "slices", and multiple copies ("duplicas") of each slice are stored across the persistent storage of multiple nodes. Requests to read data from a particular row of the table may be handled by any node that stores a duplica of the slice to which the row is assigned. For each slice, a single duplica of the slice is designated as the "primary duplica". All DML operations are performed by the node that has the primary duplica of the slice to which the target row is assigned. The changes are then propagated other duplicas ("secondary duplicas") of the same slice.
Abstract:
Techniques for implementing a buffer cache for a persistent file system in non-volatile memory is provided. A set of data is maintained in one or more extents in non-volatile random-access memory (NVRAM) of a computing device. At least one buffer header is allocated in dynamic random-access memory (DRAM) of the computing device. In response to a read request by a first process executing on the computing device to access one or more first data blocks in a first extent of the one or more extents, the first process is granted direct read access of the first extent in NVRAM. A reference to the first extent in NVRAM is stored in a first buffer header. The first buffer header is associated with the first process. The first process uses the first buffer header to directly access the one or more first data blocks in NVRAM.
Abstract:
A shared-nothing database system is provided in which the rows of each table are assigned to "slices", and multiple copies ("duplicas") of each slice are stored across the persistent storage of multiple nodes. Requests to read data from a particular row of the table may be handled by any node that stores a duplica of the slice to which the row is assigned. For each slice, a single duplica of the slice is designated as the "primary duplica". All DML operations are performed by the node that has the primary duplica of the slice to which the target row is assigned. The changes are then propagated other duplicas ("secondary duplicas") of the same slice.
Abstract:
A shared-nothing database system is provided in which the rows of each table are assigned to "slices", and multiple copies ("duplicas") of each slice are stored across the persistent storage of multiple nodes. Requests to read data from a particular row of the table may be handled by any node that stores a duplica of the slice to which the row is assigned. For each slice, a single duplica of the slice is designated as the "primary duplica". All DML operations are performed by the node that has the primary duplica of the slice to which the target row is assigned. The changes are then propagated other duplicas ("secondary duplicas") of the same slice.
Abstract:
Techniques are provided for using an intermediate cache between the shared cache of an application and the non-volatile storage of a storage system. The application may be any type of application that uses a storage system to persistently store data. The intermediate cache may be local to the machine upon which the application is executing, or may be implemented within the storage system. In one embodiment where the application is a database server, the database system includes both a DB server-side intermediate cache, and a storage-side intermediate cache. The caching policies used to populate the intermediate cache are intelligent, taking into account factors that may include which object an item belongs to, the item type of the item, a characteristic of the item, or the type of operation in which the item is involved.
Abstract:
Techniques herein store database blocks (DBBs) in byte-addressable persistent memory (PMEM) and prevent tearing without deadlocking or waiting. In an embodiment, a computer hosts a DBMS. A reader process of the DBMS obtains, without locking and from metadata in PMEM, a first memory address for directly accessing a current version, which is a particular version, of a DBB in PMEM. Concurrently and without locking: a) the reader process reads the particular version of the DBB in PMEM, and b) a writer process of the DBMS replaces, in the metadata in PMEM, the first memory address with a second memory address for directly accessing a new version of the DBB in PMEM. In an embodiment, a computer performs without locking: a) storing, in PMEM, a DBB, b) copying into volatile memory, or reading, an image of the DBB, and c) detecting whether the image of the DBB is torn.
Abstract:
Data blocks are cached in a persistent cache ("NV cache") allocated from as non-volatile RAM ("NVRAM"). The data blocks may be accessed in place in the NV cache of a "source" computing element by another "remote" computing element over a network using remote direct memory access ("RMDA"). In order for a remote computing element to access the data block in NV cache on a source computing element, the remote computing element needs the memory address of the data block within the NV cache. For this purpose, a hash table is stored and maintained in RAM on the source computing element. The hash table identifies the data blocks in the NV cache and specifies a location of the cached data block within the NV cache.
Abstract:
Techniques are provided for storing in in-memory unit (IMU) in a lower-storage tier and copying the IMU to DRAM when needed for query processing. Techniques are also provided for copying IMUs to lower tiers of storage when evicted from the cache of higher tiers of storage. Techniques are provided for implementing functionality of IMUs within a storage system, to enable database servers to push tasks, such as filtering, to the storage system where the storage system may access IMUs within its own memory to perform the tasks. Metadata associated with a set of data may be used to indicate whether an IMU for the data should be created by the database server machine or within the storage system.
Abstract:
Methods, computer-readable media, and computer systems are provided for initiating storage of data on multiple storage devices and confirming storage of the data after the data has been stored on one but not necessarily all of the devices. A storage server receives, from a client, a request to store data. In response to the request, the storage server initiates, in parallel, storage of the data on multiple storage systems. The storage server detects that the data has been stored on any one of the storage systems, such as an auxiliary system, and, in response, indicates, to the client, that the data has been stored. The storage server may flush or discard data on the auxiliary storage system upon detecting that the data has been successfully stored on a target storage system, where the data persists.