Task scheduling for machine-learning workloads
Abstract:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, are described for scheduling tasks of ML workloads. A system receives requests to perform the workloads and determines, based on the requests, resource requirements to perform the workloads. The system includes multiple hosts and each host includes multiple accelerators. The system determines a quantity of hosts assigned to execute tasks of the workload based on the resource requirement and the accelerators for each host. For each host in the quantity of hosts, the system generates a task specification based on a memory access topology of the host. The specification specifies the task to be executed at the host using resources of the host that include the multiple accelerators. The system provides the task specifications to the hosts and performs the workloads when each host executes assigned tasks specified in the task specifications for the host.
Public/Granted literature
Information query
Patent Agency Ranking
0/0