Abstract:
Techniques are provided for storing in in-memory unit (IMU) in a lower-storage tier and copying the IMU to DRAM when needed for query processing. Techniques are also provided for copying IMUs to lower tiers of storage when evicted from the cache of higher tiers of storage. Techniques are provided for implementing functionality of IMUs within a storage system, to enable database servers to push tasks, such as filtering, to the storage system where the storage system may access IMUs within its own memory to perform the tasks. Metadata associated with a set of data may be used to indicate whether an IMU for the data should be created by the database server machine or within the storage system.
Abstract:
Techniques are provided for maintaining data persistently in one format, but making that data available to a database server in more than one format. For example, one of the formats in which the data is made available for query processing is based on the on-disk format, while another of the formats in which the data is made available for query processing is independent of the on-disk format. Data that is in the format that is independent of the disk format may be maintained exclusively in volatile memory to reduce the overhead associated with keeping the data in sync with the on-disk format copies of the data.
Abstract:
One embodiment of the present invention provides a system for automatically classifying data in a database. During operation, the system receives and executes a database operation. Next, the system automatically determines if any data was modified as a result of executing the database operation. If so, for each data item that was modified, the system automatically determines if the data item is associated with a classification-rule. If so, the system automatically reclassifies the data item according to the classification-rule. If not, the system leaves a classification of the data item unchanged.
Abstract:
Techniques are provided for maintaining data persistently in one format, but making that data available to a database server in more than one format. For example, one of the formats in which the data is made available for query processing is based on the on-disk format, while another of the formats in which the data is made available for query processing is independent of the on-disk format. Data that is in the format that is independent of the disk format may be maintained exclusively in volatile memory to reduce the overhead associated with keeping the data in sync with the on-disk format copies of the data.
Abstract:
A method and apparatus for efficiently processing data in various formats in a single instruction multiple data ("SIMD") architecture is presented. Specifically, a method to unpack a fixed-width bit values in a bit stream to a fixed width byte stream in a SIMD architecture is presented. A method to unpack variable-length byte packed values in a byte stream in a SIMD architecture is presented. A method to decompress a run length encoded compressed bit-vector in a SIMD architecture is presented. A method to return the offset of each bit set to one in a bit-vector in a SIMD architecture is presented. A method to fetch bits from a bit-vector at specified offsets relative to a base in a SIMD architecture is presented. A method to compare values stored in two SIMD registers is presented.
Abstract:
Techniques are provided for more efficiently using the bandwidth of the I/O path between a CPU and volatile memory during the performance of database operation. Relational data from a relational table is stored in volatile memory as column vectors, where each column vector contains values for a particular column of the table. A binary-comparable format may be used to represent each value within a column vector, regardless of the data type associated with the column. The column vectors may be compressed and/or encoded while in volatile memory, and decompressed/decoded on-the-fly within the CPU. Alternatively, the CPU may be designed to perform operations directly on the compressed and/or encoded column vector data. In addition, techniques are described that enable the CPU to perform vector processing operations on the column vector values.
Abstract:
A computer (200) modifies data inside an object (201 ) in a database (210) without modifying other data in the remainder of the object (201 ). Insertion of new data (208C) at a specified location in the object (201 ) does not require movement of existing data in the object (201 ). Instead, the computer (200) is programmed to insert new data at a physical end of the object (201 ), and modify metadata (230) based on the specified location. Similarly, deletion of existing data from a specified location in the object (201 ) is performed without movement of other data in the object (201 ), by updating the metadata (230). The computer (200) uses the metadata (230) when reading from the object (201 ), so that the new data (208C) is automatically read whenever the specified location is accessed. The computer (200) may optionally output a handle that is static, relative to other insertions and deletions, to identify specific data within the object (201 ), for use in building indexes on the object (201 ).
Abstract:
Techniques are described for characterizing and summarizing seasonal patterns detected within a time series. According to an embodiment, a set of time series data is analyzed to identify a plurality of instances of a season, where each instance corresponds to a respective sub-period within the season. A first set of instances from the plurality of instances are associated with a particular class of seasonal pattern. After classifying the first set of instances, a second set of instances may remain unclassified or otherwise may not be associated with the particular class of seasonal pattern. Based on the first and second set of instances, a summary may be generated that identifies one or more stretches of time that are associated with the particular class of seasonal pattern. The one or more stretches of time may span at least one sub-period corresponding to at least one instance in the second set of instances.
Abstract:
A method, apparatus, and system for tracking row and object database activity into block level heatmaps is provided. Database activity including reads, writes, and creates can be tracked by a database management system at the finest possible level of granularity, or the row and object level. To efficiently record the tracked database activity, a two-part structure is described for writing the activity into heatmaps. A hierarchical in-memory component may use a dynamically allocated sparse pool of bitmap blocks. Periodically, the in-memory component is persisted to a stored representation component, sharable with multiple database instances, which may include consolidated last access times and/or a history of heatmap snapshots to reflect access over time. The heatmaps may then be externalized to database users and applications to provide and support a variety of features.
Abstract:
For automatic data placement of database data, a plurality of access-tracking data is maintained. The plurality of access-tracking data respectively corresponds to a plurality of data rows that are managed by a database server. While the database server is executing normally, it is automatically determined whether a data row, which is stored in first one or more data blocks, has been recently accessed based on the access-tracking data that corresponds to that data row. After determining that the data row has been recently accessed, the data row is automatically moved from the first one or more data blocks to one or more hot data blocks that are designated for storing those data rows, from the plurality of data rows, that have been recently accessed.