Abstract:
An apparatus and method is described herein for adaptive thread scheduling in a transactional memory environment. A number of conflicts in a thread over time are tracked. And if the conflicts exceed a threshold, the thread may be delayed (adaptively scheduled) to avoid conflicts between competing threads. Moreover, a more complex version may track a number of transaction aborts within a first thread that are caused by a second thread over a period, as well as a total number of transactions executed by the first thread over the period. From the tracking, a conflict ratio is determined for the first thread with regard to the second thread. And when the first thread is to be scheduled, it may be delayed if the second thread is running and the conflict ratio is over a conflict ratio threshold.
Abstract:
Generally, this disclosure provides systems, devices, methods and computer readable media for implementing function callback requests between a first processor (e.g., a GPU) and a second processor (e.g., a CPU). The system may include a shared virtual memory (SVM) coupled to the first and second processors, the SVM configured to store at least one double-ended queue (Deque). An execution unit (EU) of the first processor may be associated with a first of the Deques and configured to push the callback requests to that first Deque. A request handler thread executing on the second processor may be configured to: pop one of the callback requests from the first Deque; execute a function specified by the popped callback request; and generate a completion signal to the EU in response to completion of the function.
Abstract:
Generally, this disclosure provides systems, devices, methods and computer readable media for implementing function callback requests between a first processor (e.g., a GPU) and a second processor (e.g., a CPU). The system may include a shared virtual memory (SVM) coupled to the first and second processors, the SVM configured to store at least one double-ended queue (Deque). An execution unit (EU) of the first processor may be associated with a first of the Deques and configured to push the callback requests to that first Deque. A request handler thread executing on the second processor may be configured to: pop one of the callback requests from the first Deque; execute a function specified by the popped callback request; and generate a completion signal to the EU in response to completion of the function.
Abstract:
Various embodiments are generally directed to techniques for assigning instances of blocks of instructions of a routine to one of multiple types of core of a heterogeneous set of cores of a processor component. An apparatus to select types of cores includes a processor component; a core selection component for execution by the processor component to select a core of multiple cores to execute an initial subset of multiple instances of an instruction block in parallel based on characteristics of instructions of the instruction block, and to select a core of the multiple cores to execute remaining instances of the multiple instances of the instruction block in parallel based on characteristics of execution of the initial subset stored in an execution database; and a monitoring component for execution by the processor component to record the characteristics of execution of the initial subset in the execution database. Other embodiments are described and claimed.
Abstract:
Beschrieben ist ein Mechanismus zur Ermöglichung der intelligenten Ressourcenverteilung zum Tiefenlernen bei autonomen Maschinen.. Ein Verfahren von Ausführungsformen, wie hierin beschrieben, beinhaltet das Erkennen eines oder mehrerer Sätze von Daten aus einer oder mehreren Quellen über eines oder mehrere Netzwerke, und das Einfügen einer Bibliothek in eine neuronale Netzwerkanwendung, um den optimalen Punkt zu bestimmen, an dem die Frequenzskalierung anzuwenden ist, ohne die Leistung der neuronalen Netzwerkanwendung an einer Rechenvorrichtung zu beeinträchtigen.
Abstract:
Beschrieben ist ein Mechanismus zur Ermöglichung einer intelligenten Sammlung von Daten und zur intelligenten Verwaltung von autonomen Maschinen. Ein Verfahren von Ausführungsformen, wie hierin beschrieben, beinhaltet das Erkennen eines oder mehrerer Sätze von Daten von einer oder mehreren Quellen über ein oder mehrere Netzwerke, und das Kombinieren einer ersten Berechnung, die lokal an einer lokalen Rechenvorrichtung ausgeführt wird, mit einer zweiten Berechnung, die entfernt an einer entfernten Rechenvorrichtung in Kommunikation mit der lokalen Rechenvorrichtung über das eine oder die mehreren Netzwerke ausgeführt wird, wobei die erste Berechnung wenig Energie verbraucht, wobei die zweite Berechnung viel Energie verbraucht.