Invention Grant
- Patent Title: System and method for dynamic scheduling of distributed deep learning training jobs
-
Application No.: US16690999Application Date: 2019-11-21
-
Publication No.: US11693706B2Publication Date: 2023-07-04
- Inventor: Timothy Capes , Iqbal Mohomed , Vishal Raheja , Mete Kemertas
- Applicant: SAMSUNG ELECTRONICS CO., LTD.
- Applicant Address: KR Gyeonggi-do
- Assignee: SAMSUNG ELECTRONICS CO., LTD.
- Current Assignee: SAMSUNG ELECTRONICS CO., LTD.
- Current Assignee Address: KR Suwon-si
- Agency: Sughrue Mion, PLLC
- Main IPC: G06F9/50
- IPC: G06F9/50 ; G06V10/82 ; G06N3/08 ; G06N7/08 ; G06F18/214 ; G06N5/01 ; G06V10/764 ; G06V10/94 ; G06V10/96 ; G06N3/084

Abstract:
A scheduling algorithm for scheduling training of deep neural network (DNN) weights on processing units identifies a next job to provisionally assign a processing unit (PU) based on a doubling heuristic. The doubling heuristic makes use of an estimated number of training sets needed to complete training of weights for a given job and/or a training speed function which indicates how fast the weights are converging. The scheduling algorithm solves a problem of efficiently assigning PUs when multiple DNN weight data structures must be trained efficiently. In some embodiments, the training of the weights uses a ring-based message passing architecture. In some embodiments, performance using a nested loop approach or nested loop fashion is provided. In inner iterations of the nested loop, PUs are scheduled and jobs are launched or re-started. In outer iterations of the nested loop, jobs are stopped, parameters are updated and the inner iteration is re-entered.
Public/Granted literature
- US20200159589A1 SYSTEM AND METHOD FOR DYNAMIC SCHEDULING OF DISTRIBUTED DEEP LEARNING TRAINING JOBS Public/Granted day:2020-05-21
Information query