Gap-aware mitigation of gradient staleness
Abstract:
Disclosed embodiments are a computing system and a computer-implemented method for distributed training of a machine learning model over a plurality of computing nodes, in a plurality of iterations, characterized by gradient gap based mitigation of the gradient staleness problem. The disclosed method evaluates the staleness of the gradient based on the difference in gradients between a central point, for example an iteration's common starting point, and the points reached by the respective computing node during one or more iterations, and aggregates the update steps from the plurality of computing nodes, while giving more weight to computing nodes having a lesser change in the gradient.
Public/Granted literature
Information query
Patent Agency Ranking
0/0