Abstract:
A corpus of information describing queries used to access a transactional data store may be used to identify analytical relationships that are not explicitly defined in a schema or supplied by a user. Join relationships may be identified based on field coincidence in elements of queries in the corpus. Join relationships may be indicative of dimensions and attributes of a dimension. Hierarchy levels for a dimension may be identified based on factors including data type, reference in an aggregating clause, and reference in a grouping clause.
Abstract:
A network-based services provider may reserve and provision primary resource instance capacity for a given service (e.g., enough compute instances, storage instances, or other virtual resource instances to implement the service) in one or more availability zones, and may designate contingency resource instance capacity for the service in another availability zone (without provisioning or reserving the contingency instances for the exclusive use of the service). For example, the service provider may provision resource instance(s) for a database engine head node in one availability zone and designate resource instance capacity for another database engine head node in another availability zone without instantiating the other database engine head node. While the service operates as expected using the primary resource instance capacity, the contingency resource capacity may be leased to other entities on a spot market. Leases for contingency instance capacity may be revoked when needed for the given service (e.g., during failover).
Abstract:
Self-describing data blocks of a minimum atomic write size may be stored for a data store. Data may be received for storage in a data block of a plurality of data blocks at a persistent storage device that are equivalent to a minimum atomic write size for the persistent storage device. Metadata may be generated for the data that includes an error detection code which is generated for the data and the metadata together. The data and the metadata are sent to the persistent storage device to store together in the data block. An individual atomic write operation may write together the data and the metadata in the data block. When accessed, the error detection code is applicable to detect errors. The metadata may also be applicable to determine whether the data is stored for a currently assigned purpose or a previously assigned purpose of the data block.
Abstract:
A multi-column index is generated based on an interleaving of data bits for selectivity for efficient processing of data in a relational database system. Two or more columns may be identified for inclusion in the multi-column index for a relational database table. Based, at least in part, on the interleaving of data bits for selectivity from the identified columns, a multi-column index is generated for the relational database table that provides a respective index value for each entry in the relational database table. The entries of the relational database table may then be stored according to the index values of the multi-column index.
Abstract:
A log-structured data store implementing data backup may implement variable data replication. Write requests may be received at different storage nodes maintaining respective replicas of a portion of a log for data maintained in the log-structured data store. Log records indicating the write requests may be stored in the respective replicas of the log portions at the different storage nodes. The log records may be sent to a backup data store to be durability persisted as part of an archived version of the log. At some of the storage nodes, in response to determining that the log records have been durably persisted in the backup data store, storage space for the log records may be reclaimed. In other remaining storage nodes, the log records may be retained and made accessible for servicing read requests.
Abstract:
Code may be dynamically routed to computing resources for execution. Code may be received for execution on behalf of a client. Execution criteria for the code may be determined and computing resources that satisfy the execution criteria may be identified. The identified computing resources may then be procured for executing the code and then the code may be routed to the procured computing resources for execution. Permissions or authorization to execute the code may be shared to ensure that computing resources executing the code have the same permissions or authorization when executing the code.
Abstract:
Proxy-based scaling may be performed for databases. A proxy may be implemented for a database that can establish a connection between the proxy and a database engine to perform a database queries received from a client at the proxy. A scaling event may be detected for the database responsive to which the proxy may establish a connection with a new database engine which may, in some embodiments, have different capabilities or resources that address the features or criteria that triggered the scaling event. Session state may be copied from the database engine to the new database engine so that the new database engine may be able to provide access to the database on behalf of requests received from the client through the proxy.
Abstract:
History for data objects may be maintained to detect data events. An indication of an Extract, Transform, Load (ETL) process applied to one or more source data objects to generate one or more transformed data objects may be received. History for the source data objects may be updated to include the transformed data objects and the ETL process that generated the transformed data objects. An evaluation of the update may be performed to determine whether an event associated with the data lineage is triggered. If the event is triggered, a notification of the event may be sent to one or more subscribers for the event.
Abstract:
A multi-column index is generated based on an interleaving of data bits for selectivity for efficient processing of data in a relational database system. Two or more columns may be identified for inclusion in the multi-column index for a relational database table. Based, at least in part, on the interleaving of data bits for selectivity from the identified columns, a multi-column index is generated for the relational database table that provides a respective index value for each entry in the relational database table. The entries of the relational database table may then be stored according to the index values of the multi-column index.
Abstract:
Data transformation workflows may be generated to transform data objects. A source data schema for a data object and a target data format or target data schema for a data object may be identified. A comparison of the source data schema and the target data format or schema may be made to determine what transformations can be performed to transform the data object into the target data format or schema. Code to execute the transformation operations may then be generated. The code may be stored for subsequent modification or execution.