-
公开(公告)号:US11899684B2
公开(公告)日:2024-02-13
申请号:US17472445
申请日:2021-09-10
Applicant: Amazon Technologies, Inc.
Inventor: Timothy Andrew Rath , David Alan Lutz
IPC: G06F17/00 , G06F16/27 , G06F16/182 , H04W84/20
CPC classification number: G06F16/27 , G06F16/182 , G06F16/273 , G06F16/278 , H04W84/20
Abstract: A system that implements a data storage service may store data on behalf of clients in multiple replicas on respective computing nodes. The system may employ an external service to select a master replica for a replica group. The master replica may service consistent read operations and/or write operations that are directed to the replica group (or to a data partition stored by the replica group). The master replica may employ a quorum based mechanism for performing replicated write operations, and a local lease mechanism for determining the replica authorized to perform consistent reads, even when the external service is unavailable. The master replica may propagate local leases to replica group members as replicated writes. If another replica assumes mastership for the replica group, it may not begin servicing consistent read operations that are directed to the replica group until the lease period for a current local lease expires.
-
公开(公告)号:US11894972B2
公开(公告)日:2024-02-06
申请号:US17811519
申请日:2022-07-08
Applicant: Amazon Technologies, Inc.
Inventor: Timothy Andrew Rath , Jakub Kulesza , David Alan Lutz
IPC: G06F11/00 , H04L41/0668 , G06F11/20 , G06F11/14 , G06F11/16 , H04L67/51 , G06F3/06 , H04L67/1097
CPC classification number: H04L41/0668 , G06F3/0617 , G06F3/0653 , G06F3/0659 , G06F3/0683 , G06F11/1425 , G06F11/1662 , G06F11/2028 , G06F11/2041 , G06F11/2094 , G06F11/2097 , H04L67/1097 , H04L67/51 , G06F11/2048 , G06F2201/825
Abstract: A system that implements a data storage service may store data on behalf of storage service clients. The system may maintain data in multiple replicas of various partitions that are stored on respective computing nodes in the system. The system may employ a single master failover protocol, usable when a replica attempts to become the master replica for a replica group of which it is a member. Attempting to become the master replica may include acquiring a lock associated with the replica group, and gathering state information from the other replicas in the group. The state information may indicate whether another replica supports the attempt (in which case it is included in a failover quorum) or stores more recent data or metadata than the replica attempting to become the master (in which case synchronization may be required). If the failover quorum includes enough replicas, the replica may become the master.
-
公开(公告)号:US11687555B2
公开(公告)日:2023-06-27
申请号:US16684901
申请日:2019-11-15
Applicant: Amazon Technologies, Inc.
Inventor: Akshat Vig , Timothy Andrew Rath , Stuart Henry Seelye Marshall , Rande A. Blackman , David Alan Lutz , Jian Wang , Jiandan Zheng , Janani Narayanan
IPC: G06F17/30 , G06F16/27 , G06F16/28 , G06F16/90 , G06F16/2458 , G06F16/2455
CPC classification number: G06F16/27 , G06F16/2471 , G06F16/24565 , G06F16/28 , G06F16/90
Abstract: Methods and apparatus for conditional master election in a distributed database are described. A plurality of replicas of a database object are stored by a distributed database service. Some types of operations corresponding to client requests directed at the database object are to be coordinated by a master replica. Client access to the database object is enabled prior to election of a master replica. In response to a triggering condition, a particular replica is elected master. The master coordinates implementation of operations with one or more other replicas in response to client requests.
-
公开(公告)号:US11388043B2
公开(公告)日:2022-07-12
申请号:US16833334
申请日:2020-03-27
Applicant: Amazon Technologies, Inc.
Inventor: Timothy Andrew Rath , Jakub Kulesza , David Alan Lutz
IPC: G06F11/00 , H04L41/0668 , G06F11/20 , G06F11/14 , G06F11/16 , G06F3/06 , H04L67/1097 , H04L67/51
Abstract: A system that implements a data storage service may store data on behalf of storage service clients. The system may maintain data in multiple replicas of various partitions that are stored on respective computing nodes in the system. The system may employ a single master failover protocol, usable when a replica attempts to become the master replica for a replica group of which it is a member. Attempting to become the master replica may include acquiring a lock associated with the replica group, and gathering state information from the other replicas in the group. The state information may indicate whether another replica supports the attempt (in which case it is included in a failover quorum) or stores more recent data or metadata than the replica attempting to become the master (in which case synchronization may be required). If the failover quorum includes enough replicas, the replica may become the master.
-
公开(公告)号:US10257288B2
公开(公告)日:2019-04-09
申请号:US14570900
申请日:2014-12-15
Applicant: Amazon Technologies, Inc.
Inventor: Wei Xiao , David Alan Lutz , Timothy Andrew Rath , Maximiliano Maccanti , Miguel Mascarenhas Filipe , David Craig Yanacek
Abstract: A system that provides services to clients may receive and service requests, various ones of which may require different amounts of work. The system may determine whether it is operating in an overloaded or underloaded state based on a current work throughput rate, a target work throughput rate, a maximum request rate, or an actual request rate, and may dynamically adjust the maximum request rate in response. For example, if the maximum request rate is being exceeded, the maximum request rate may be raised or lowered, dependent on the current work throughput rate. If the target or committed work throughput rate is being exceeded, but the maximum request rate is not being exceeded, a lower maximum request rate may be proposed. Adjustments to the maximum request rate may be made using multiple incremental adjustments. Service request tokens may be added to a leaky token bucket at the maximum request rate.
-
公开(公告)号:US09886348B2
公开(公告)日:2018-02-06
申请号:US14754564
申请日:2015-06-29
Applicant: Amazon Technologies, Inc.
Inventor: Timothy Andrew Rath , Jakub Kulesza , David Alan Lutz
CPC classification number: G06F11/1451 , G06F11/1425 , G06F11/2094 , G06F11/2097 , G06F17/30557 , G06F17/30575 , G06F17/30578
Abstract: A system that implements a data storage service may store data on behalf of storage service clients. The system may maintain data in multiple replicas of partitions that are stored on respective computing nodes in the system. A master replica for a replica group may increment a membership version indicator for the group, and may propagate metadata (including the membership version indicator) indicating a membership change for the group to other members of the group. Propagating the metadata may include sending a log record containing the metadata to the other replicas to be appended to their respective logs. Once the membership change becomes durable, it may be committed. A replica attempting to become the master of a replica group may determine that another replica in the group has observed a more recent membership version, in which case logs may be synchronized or snipped, or the attempt may be abandoned.
-
公开(公告)号:US09460185B2
公开(公告)日:2016-10-04
申请号:US14733887
申请日:2015-06-08
Applicant: Amazon Technologies, Inc.
Inventor: Bjorn Patrick Swift , Wei Xiao , Stuart Henry Seelye Marshall , Stefano Stefani , Timothy Andrew Rath , David Alan Lutz
IPC: G06F17/30
CPC classification number: G06F17/30584 , G06F17/30575
Abstract: A system that implements a data storage service may store data in multiple replicated partitions on respective storage nodes. The selection of the storage nodes (or storage devices thereof) on which to store the partition replicas may be performed by administrative components that are responsible for partition management and resource allocation for respective groups of storage nodes (e.g., based on a global view of resource capacity or usage), or the selection of particular storage devices of a storage node may be determined by the storage node itself (e.g., based on a local view of resource capacity or usage). Placement policies applied at the administrative layer or storage layer may be based on the percentage or amount of provisioned, reserved, or available storage or IOPS capacity on each storage device, and particular placements (or subsequent operations to move partition replicas) may result in an overall resource utilization that is well balanced.
-
公开(公告)号:US20240205073A1
公开(公告)日:2024-06-20
申请号:US18391261
申请日:2023-12-20
Applicant: Amazon Technologies, Inc.
Inventor: Timothy Andrew Rath , Jakub Kulesza , David Alan Lutz
IPC: H04L41/0668 , G06F3/06 , G06F11/14 , G06F11/16 , G06F11/20 , H04L67/1097 , H04L67/51
CPC classification number: H04L41/0668 , G06F3/0617 , G06F3/0653 , G06F3/0659 , G06F3/0683 , G06F11/1425 , G06F11/1662 , G06F11/2028 , G06F11/2041 , G06F11/2094 , G06F11/2097 , H04L67/1097 , H04L67/51 , G06F11/2048 , G06F2201/825
Abstract: A system that implements a data storage service may store data on behalf of storage service clients. The system may maintain data in multiple replicas of various partitions that are stored on respective computing nodes in the system. The system may employ a single master failover protocol, usable when a replica attempts to become the master replica for a replica group of which it is a member. Attempting to become the master replica may include acquiring a lock associated with the replica group, and gathering state information from the other replicas in the group. The state information may indicate whether another replica supports the attempt (in which case it is included in a failover quorum) or stores more recent data or metadata than the replica attempting to become the master (in which case synchronization may be required). If the failover quorum includes enough replicas, the replica may become the master.
-
公开(公告)号:US20220345358A1
公开(公告)日:2022-10-27
申请号:US17811519
申请日:2022-07-08
Applicant: Amazon Technologies, Inc.
Inventor: Timothy Andrew Rath , Jakub Kulesza , David Alan Lutz
IPC: H04L41/0668 , G06F11/20 , G06F11/14 , G06F11/16 , H04L67/51 , G06F3/06 , H04L67/1097
Abstract: A system that implements a data storage service may store data on behalf of storage service clients. The system may maintain data in multiple replicas of various partitions that are stored on respective computing nodes in the system. The system may employ a single master failover protocol, usable when a replica attempts to become the master replica for a replica group of which it is a member. Attempting to become the master replica may include acquiring a lock associated with the replica group, and gathering state information from the other replicas in the group. The state information may indicate whether another replica supports the attempt (in which case it is included in a failover quorum) or stores more recent data or metadata than the replica attempting to become the master (in which case synchronization may be required). If the failover quorum includes enough replicas, the replica may become the master.
-
公开(公告)号:US10929240B2
公开(公告)日:2021-02-23
申请号:US15887853
申请日:2018-02-02
Applicant: Amazon Technologies, Inc.
Inventor: Timothy Andrew Rath , Jakub Kulesza , David Alan Lutz
Abstract: A system that implements a data storage service may store data on behalf of storage service clients. The system may maintain data in multiple replicas of partitions that are stored on respective computing nodes in the system. A master replica for a replica group may increment a membership version indicator for the group, and may propagate metadata (including the membership version indicator) indicating a membership change for the group to other members of the group. Propagating the metadata may include sending a log record containing the metadata to the other replicas to be appended to their respective logs. Once the membership change becomes durable, it may be committed. A replica attempting to become the master of a replica group may determine that another replica in the group has observed a more recent membership version, in which case logs may be synchronized or snipped, or the attempt may be abandoned.
-
-
-
-
-
-
-
-
-