Abstract:
A secondary location of a network acts as a recovery network for a primary location of the service. The secondary location is maintained in a warm state that is configured to replace the primary location in a case of a failover. During normal operation, the primary location actively services user load and performs backups that include full backups, incremental backups and transaction logs that are automatically replicated to the secondary location. Information is stored (e.g., time, retry count) that may be used to assist in determining when the backups are restored correctly at the secondary location. The backups are restored and the transaction logs are replayed at the secondary location to reflect changes (content and administrative) that are made to the primary location. After failover to the secondary location, the secondary location becomes the primary location and begins to actively service the user load.
Abstract:
Software that would not normally be able to be installed on a machine through a remote process is installed by a high privilege installer running on the machine. A request is received from a remote machine to install software on the machine using the high privilege installer. The high privilege installer determines when software that was requested remotely is to be installed. For example, the high privilege installer may monitor an install queue for software to be installed. When there are entries in the install queue, the high privilege installer is used to install the software. When there are no entries in the install queue, the high privilege installer may sleep until there is more software that is identified to be installed.
Abstract:
A machine manager controls the deployment and management of machines for an online service. The machine manager is configured to manually/automatically deploy farms, upgrade farms, add machines, remove machines, start machines, stop machines, and the like. The machine manager keeps track of the locations of the machines, the roles of the machines within the networks, as well as other characteristics relating to the machines (e.g. health of the machines). Instead of upgrading software on the machines in a farm that are currently handling requests, one or more machines are configured in a new farm with the selected disk images and then the requests are moved from the old farm to the new farm.
Abstract:
An online service includes managed databases that include one or more tenants (e.g. customers, users). A multi-tenant database may be split between two or more databases while the database being split continues processing requests. For example, web servers continue to request operations on the database while content is being moved. After moving the content, tenant traffic is automatically redirected to the database that contains the tenant's content.
Abstract:
Objects are placed on hosts using hard constraints and soft constraints. The objects to be placed on the host may be many different types of objects. For example, the objects to place may include tenants in a database, virtual machines on a physical machine, databases on a virtual machine, tenants in directory forests, tenants in farms, and the like. When determining a host for an object, a pool of hosts is filtered through a series of hard constraints. The remaining pool of hosts is further filtered through soft constraints to help in selection of a host. A host is then chosen from the remaining hosts.
Abstract:
A cloud manager is utilized in the patching of physical machines and virtual machines that are used within an online service, such as an online content management service. The cloud manager assists in the scheduling of the application of software patches to the machines (physical and virtual) within the network such that the availability of the online service is maintained while machines are being patched. The machines to be patched are partitioned into groups that are patched at different times. Generally, the groups are partitioned into a highly available independent groups of machines such that one or more of the groups that are not currently being patched continue to provide the service(s) of the group that is being patched. The machines (physical and virtual) within each of the groups may be patched in parallel.
Abstract:
An idempotent and asynchronous application programming interface (API) that can not rely on a reliable network is used by a cloud manager to receive and process requests. The cloud manager system is a central coordination service that receives requests using the API to perform update operations and get operations relating to the online service. For example, the API includes methods for deploying machines, updating machines, removing machines, performing configuration changes on servers, Virtual Machines (VMs), as well as performing other tasks relating to the management of the online service. Receiving and processing a same API call multiple times results in a same result.
Abstract:
Jobs submitted to a primary location of a service within a period of time before and/or after a fail-over event are determined and are resubmitted to a secondary location of the service. For example, jobs that are submitted fifteen minutes before the fail-over event and jobs that are submitted to the primary network before the fail-over to the second location is completed are resubmitted at the secondary location. After the fail-over event occurs, the jobs are updated with the secondary network that is taking the place of the primary location of the service. A mapping of job input parameters (e.g., identifiers and/or secrets) from the primary location to the secondary location are used by the jobs when they are resubmitted to the secondary location. Each job determines what changes are to be made to the job request based on the job being resubmitted.
Abstract:
A machine manager controls the deployment and management of machines for an online service. The machine manager is configured to manually/automatically deploy farms, upgrade farms, add machines, remove machines, start machines, stop machines, and the like. The machine manager keeps track of the locations of the machines, the roles of the machines within the networks, as well as other characteristics relating to the machines (e.g. health of the machines). Instead of upgrading software on the machines in a farm that are currently handling requests, one or more machines are configured in a new farm with the selected disk images and then the requests are moved from the old farm to the new farm.
Abstract:
Web request routers in a cloud management system are used to route requests to content within the networks that are associated with an online service. The web request routers receive requests, parse the requests and forward the requests to the appropriate destination. The web request routers may use application specific logic for routing the requests. For example, the requests may be routed based on a document identifier and/or user information that is included within the received request. A look up table may be used in determining a destination for the request. When a location of content changes within the online service, the look up table may be updated such that the web request routers automatically direct content to the updated location. A user may also specify where their requests are to be routed.