Abstract:
Disclosed are various embodiments for distributing data items within a plurality of nodes. A data item that is subject to a data item update request is updated from a master node to a plurality of slave notes. The update of the data item is determined to be locality-based durable based at least in part on acknowledgements received from the slave nodes. Upon detection that the master node has failed, a new master candidate is determined via an election among the plurality of slave nodes.
Abstract:
A system that implements a scalable data storage service may maintain tables in a non-relational data store on behalf of clients. The system may provide a Web services interface through which service requests are received, and an API usable to request that a table be created, deleted, or described; that an item be stored, retrieved, deleted, or its attributes modified; or that a table be queried (or scanned) with filtered items and/or their attributes returned. An asynchronous workflow may be invoked to create or delete a table. Items stored in tables may be partitioned and indexed using a simple or composite primary key. The system may not impose pre-defined limits on table size, and may employ a flexible schema. The service may provide a best-effort or committed throughput model. The system may automatically scale and/or re-partition tables in response to detecting workload changes, node failures, or other conditions or anomalies.
Abstract:
Partitions of a hosted computing service may be maintained on a computing node. Processing of requests to access the partition may be limited to constrain capacity utilization to a provisioned amount of capacity reserved for the partition. A second, additional amount of capacity may be associated with the partition and may reflect potential future changes to the provisioned amount of capacity. A sum of provisioned and additional capacities associated with partitions on a computing node may be calculated. The computing node may be ranked, relative to other computing nodes, for maintaining new or relocated partitions based on the sum.
Abstract:
A system that implements a scalable data storage service may maintain tables in a non-relational data store on behalf of clients. The system may provide a Web services interface through which service requests are received, and an API usable to request that a table be created, deleted, or described; that an item be stored, retrieved, deleted, or its attributes modified; or that a table be queried (or scanned) with filtered items and/or their attributes returned. An asynchronous workflow may be invoked to create or delete a table. Items stored in tables may be partitioned and indexed using a simple or composite primary key. The system may not impose pre-defined limits on table size, and may employ a flexible schema. The service may provide a best-effort or committed throughput model. The system may automatically scale and/or re-partition tables in response to detecting workload changes, node failures, or other conditions or anomalies.
Abstract:
A system that provides services to clients may receive and service requests, various ones of which may require different amounts of work. An admission control mechanism may manage requests based on tokens, each of which represents a fixed amount of work. The tokens may be added to a token bucket at rate that is dependent on a target work throughput rate while the number of tokens in the bucket does not exceed its maximum capacity. If at least a pre-determined minimum number of tokens is present in the bucket when a service request is received, it may be serviced. Servicing a request may include deducting an initial number of tokens from the bucket, determining that the amount of work performed in servicing the request is different than that represented by the initially deducted tokens, and deducting additional tokens from or replacing tokens in the bucket to reflect the difference.
Abstract:
A corpus of information describing queries used to access a transactional data store may be used to identify analytical relationships that are not explicitly defined in a schema or supplied by a user. Join relationships may be identified based on field coincidence in elements of queries in the corpus. Join relationships may be indicative of dimensions and attributes of a dimension. Hierarchy levels for a dimension may be identified based on factors including data type, reference in an aggregating clause, and reference in a grouping clause.
Abstract:
A system that implements a data storage service may store data on behalf of storage service clients. The system may maintain data in multiple replicas of various partitions that are stored on respective computing nodes in the system. The system may employ a single master failover protocol, usable when a replica attempts to become the master replica for a replica group of which it is a member. Attempting to become the master replica may include acquiring a lock associated with the replica group, and gathering state information from the other replicas in the group. The state information may indicate whether another replica supports the attempt (in which case it is included in a failover quorum) or stores more recent data or metadata than the replica attempting to become the master (in which case synchronization may be required). If the failover quorum includes enough replicas, the replica may become the master.
Abstract:
A distributed database management system comprising a plurality of computing nodes may distribute data evenly across all nodes. A definition of a primary key that divides the primary key into at least a first key portion and a second key portion may be utilized to locate items related by a first key portion to a specific computing node. Application-consistent queries, local transactions and pivoting operations may be performed on items related by a first key portion.
Abstract:
An online analytical processing system may comprise an n-dimensional cube partitioned into slices, in which each slice may represent data points at the intersections of fixed and variable dimensions. Computation of data points within a slice may be deferred. A dependency graph may be initially constructed, in which the dependency graph is utilized in a subsequent computation. Calculation of data points may be prioritized based on information indicative of a chance that the data points will be accessed.
Abstract:
An online analytical processing system may comprise an n-dimensional cube structured using slice-based partitioning in which each slice comprises data points corresponding to a set of dimension values fixed across the slice and a set of dimension values allowed to vary. Slices may be partitioned and replicated across computing nodes. Views of the n-dimensional cube may be partially materialized by determining dependencies between slices. A central data dictionary may maintain information about slices and slice dependencies. Dimensions may be added by adding a new slice without requiring immediate recomputation of existing data points.