Work stealing in heterogeneous computing systems

    公开(公告)号:US11138048B2

    公开(公告)日:2021-10-05

    申请号:US15391549

    申请日:2016-12-27

    Abstract: A work stealer apparatus includes a determination module. The determination module is to determine to steal work from a first hardware computation unit of a first type for a second hardware computation unit of a second type that is different than the first type. The work is to be queued in a first work queue, which is to correspond to the first hardware computation unit, and which is to be stored in a shared memory that is to be shared by the first and second hardware computation units. A synchronized work stealer module is to steal the work through a synchronized memory access to the first work queue. The synchronized memory access is to be synchronized relative to memory accesses to the first work queue from the first hardware computation unit.

    Function callback mechanism between a Central Processing Unit (CPU) and an auxiliary processor

    公开(公告)号:US10706496B2

    公开(公告)日:2020-07-07

    申请号:US16282553

    申请日:2019-02-22

    Abstract: Generally, this disclosure provides systems, devices, methods and computer readable media for implementing function callback requests between a first processor (e.g., a GPU) and a second processor (e.g., a CPU). The system may include a shared virtual memory (SVM) coupled to the first and second processors, the SVM configured to store at least one double-ended queue (Deque). An execution unit (EU) of the first processor may be associated with a first of the Deques and configured to push the callback requests to that first Deque. A request handler thread executing on the second processor may be configured to: pop one of the callback requests from the first Deque; execute a function specified by the popped callback request; and generate a completion signal to the EU in response to completion of the function.

    ADAPTIVE SCHEDULING FOR TASK ASSIGNMENT AMONG HETEROGENEOUS PROCESSOR CORES

    公开(公告)号:US20190080429A1

    公开(公告)日:2019-03-14

    申请号:US16185965

    申请日:2018-11-09

    CPC classification number: G06T1/20 G06F3/14 G09G5/001 G09G5/363 G09G2360/08

    Abstract: Generally, this disclosure provides systems, devices, methods and computer readable media for adaptive scheduling of task assignment among heterogeneous processor cores. The system may include any number of CPUs, a graphics processing unit (GPU) and memory configured to store a pool of work items to be shared by the CPUs and GPU. The system may also include a GPU proxy profiling module associated with one of the CPUs to profile execution of a first portion of the work items on the GPU. The system may further include profiling modules, each associated with one of the CPUs, to profile execution of a second portion of the work items on each of the CPUs. The measured profiling information from the CPU profiling modules and the GPU proxy profiling module is used to calculate a distribution ratio for execution of a remaining portion of the work items between the CPUs and the GPU.

Patent Agency Ranking