Abstract:
Techniques are provided for providing a data item to a transaction in a multi-versioning system in which the data item may exist on multiple versions of a data block, and were versioning is performed at the granularity of the data block. According to one aspect of the invention, the technique involves locating, within volatile memory, a first version of a data block that includes a first version of the data item. It is then determined whether the first version of the data item is usable by the transaction without respect to whether the first version of the data block is generally usable by the transaction. If the first version of the data item is usable by the transaction, then the data item is established as a candidate that can be provided to the transaction. Thus, the data item within a block may be considered a candidate to be provided to a transaction even when the version of the data block on which the data item resides would otherwise disqualify the data block from being seen by that transaction. If the first version of the data item is not usable by the transaction, then a version of the data item that is usable by the transaction is obtained from a second version of the data block that is different from the first version.
Abstract:
Techniques are provided for maintaining data persistently in one format, but making that data available to a database server in more than one format. Data that is in the format that is independent of the disk format may be maintained exclusively in volatile memory to reduce the overhead associated with keeping the data in sync with the on-disk format copies of the data. Selection of data to be maintained in the volatile memory may be based on various factors. Once selected the data may also be compressed to save space in the volatile memory. The compression level may depend on one or more factors that are evaluated for the selected data. The factors for the selection and compression level of data may be periodically evaluated, and based on the evaluation, the selected data may be removed from the volatile memory or its compression level changed accordingly.
Abstract:
Techniques are described for materializing pre-computed results of expressions. In an embodiment, a set of one or more column units are stored in volatile or non-volatile memory. Each column unit corresponds to a column that belongs to an on-disk table within a database managed by a database server instance and includes data items from the corresponding column. A set of one or more virtual column units, and data that associates the set of one or more column units with the set of one or more virtual column units, are also stored in memory. The set of one or more virtual column units includes a particular virtual column unit storing results that are derived by evaluating an expression on at least one column of the on-disk table.
Abstract:
A method, apparatus, and system for OZIP, a data compression and decompression codec, is provided. OZIP utilizes a fixed size static dictionary, which may be generated from a random sampling of input data to be compressed. Compression by direct token encoding to the static dictionary streamlines the encoding and avoids expensive conditional branching, facilitating hardware implementation and high parallelism. By bounding token definition sizes and static dictionary sizes to hardware architecture constraints such as word size or processor cache size, hardware implementation can be made fast and cost effective. For example, decompression may be accelerated by using SIMD instruction processor extensions. A highly granular block mapping in optional stored metadata allows compressed data to be accessed quickly at random, bypassing the processing overhead of dynamic dictionaries. Thus, OZIP can support low latency random data access for highly random workloads, such as for OLTP systems.
Abstract:
A method for accelerating queries using dynamically generated columnar data in a flash cache is provided. In an embodiment, a method comprises a storage device receiving a first request for data that is stored in the storage device in a base major format in one or more primary storage devices. The storage device comprises a cache. The base major format is any one of: a row-major format, a column-major format and a hybrid-columnar format. Based on first one or more criteria, it is determined whether to rewrite the data into rewritten data in a rewritten major format. In response to determining to rewrite the data into rewritten data in a rewritten major format, the storage device rewrites at least a portion of the data into particular rewritten data in the rewritten major format. The rewritten data is stored in the cache.
Abstract:
A method, apparatus, and system for policy driven data placement and information lifecycle management in a database management system are provided. A user or database application can specify declarative policies that define the movement and transformation of stored database objects. The policies are associated with a database object and may also be inherited. A policy defines, for a database object, an archiving action to be taken, a scope, and a condition before the archiving action is triggered. Archiving actions may include compression, data movement, table clustering, and other actions to place the database object into an appropriate storage tier for a lifecycle phase of the database object. Conditions may optionally invoke user-defined functions, and may be based on access statistics specified at the row level and may use segment or block level heatmaps. Policy evaluation occurs periodically in the background, with actions queued as tasks for a task scheduler.
Abstract:
Techniques for activity tracking, data classification, and in-database archiving are described. Activity tracking refers to techniques that collect statistics related to user access patterns, such as the frequency or recency with which users access particular database elements. The statistics gathered through activity tracking can be supplied to data classification techniques to automatically classify the database elements or to assist users with manually classifying the database elements. Then, once the database elements have been classified, in-database archiving techniques can be employed to move database elements to different storage tiers based on the classifications. However, although the techniques related to activity tracking, data classification, and in-database archiving may be used together as described above; each technique may also be practiced separately.
Abstract:
Techniques are provided for more efficiently using the bandwidth of the I/O path between a CPU and volatile memory during the performance of database operation. Relational data from a relational table is stored in volatile memory as column vectors, where each column vector contains values for a particular column of the table. A binary-comparable format may be used to represent each value within a column vector, regardless of the data type associated with the column. The column vectors may be compressed and/or encoded while in volatile memory, and decompressed/decoded on-the-fly within the CPU. Alternatively, the CPU may be designed to perform operations directly on the compressed and/or encoded column vector data. In addition, techniques are described that enable the CPU to perform vector processing operations on the column vector values.
Abstract:
Techniques are provided for determining which data item version to supply to a query. According to the techniques, the determination is made by associating a new field, which indicates the time a data item version was current, with each data item version; associating a new field with each query, which indicates the last change that the query must see made by the transaction to which the query belongs; and determining which data item version to use to answer the query based, in part, on a comparison between the values of the two new fields.
Abstract:
Techniques are provided for determining which data item version to supply to a query. According to the techniques, the determination is made by associating a new field, which indicates the time a data item version was current, with each data item version; associating a new field with each query, which indicates the last change that the query must see made by the transaction to which the query belongs; and determining which data item version to use to answer the query based, in part, on a comparison between the values of the two new fields.