Abstract:
Examples are disclosed for adjusting a performance state of a graphics subsystem and/or a processor based on a comparison of an average frame rate to a target frame rate and also based on whether the graphics subsystem is in a burst mode or sustained mode of operation.
Abstract:
Methods and apparatus relating to transactional page fault handling. In an example, an apparatus comprises a processor to divide an execution thread of a graphics workload into a set of transactions which are to be executed atomically, initiate the execution of the thread, and manage the execution of the thread according to one of a first protocol in response to a determination that a page fault occurred in the execution of a transaction, or a second protocol in response to a determination that a page fault did not occur in the execution of a transaction. Other embodiments are also disclosed and claimed.
Abstract:
A computing device for performing scheduling operations for graphics hardware is described herein. The computing device includes a central processing unit (CPU) that is configured to execute an application. The computing device also includes a graphics scheduler configured to operate independently of the CPU. The graphics scheduler is configured to receive work queues relating to workloads from the application that are to execute on the CPU and perform scheduling operations for any of a number of graphics engines based on the work queues.
Abstract:
Embodiments are generally directed to thread group scheduling for graphics processing. An embodiment of an apparatus includes a plurality of processors including a plurality of graphics processors to process data; a memory; and one or more caches for storage of data for the plurality of graphics processors, wherein the one or more processors are to schedule a plurality of groups of threads for processing by the plurality of graphics processors, the scheduling of the plurality of groups of threads including the plurality of processors to apply a bias for scheduling the plurality of groups of threads according to a cache locality for the one or more caches.
Abstract:
Methods and apparatus relating to scalar core integration in a graphics processor. In an example, an apparatus comprises a processor to receive a set of workload instructions for a graphics workload from a host complex, determine a first subset of operations in the set of operations that is suitable for execution by a scalar processor complex of the graphics processing device and a second subset of operations in the set of operations that is suitable for execution by a vector processor complex of the graphics processing device, assign the first subset of operations to the scalar processor complex for execution to generate a first set of outputs, assign the second subset of operations to the vector processor complex for execution to generate a second set of outputs. Other embodiments are also disclosed and claimed.
Abstract:
Methods and systems may provide for receiving, at a controller, a notification of a workload submission by an application lacking system level privileges. Additionally, the controller may be used to schedule a transfer of the workload submission to a graphics hardware component for execution, wherein the controller has system level privileges. In one example, the transfer bypasses an operating system and a kernel mode driver associated with the graphics hardware component.
Abstract:
Transitions to ring 0, each time an application wants to use an adjunct processor, are avoided, saving central processor operating cycles and improving efficiency. Instead, initially each application is registered and setup to use adjunct processor resources in ring 3.
Abstract:
Embodiments are generally directed to data prefetching for graphics data processing. An embodiment of an apparatus includes one or more processors including one or more graphics processing units (GPUs); and a plurality of caches to provide storage for the one or more GPUs, the plurality of caches including at least an L1 cache and an L3 cache, wherein the apparatus to provide intelligent prefetching of data by a prefetcher of a first GPU of the one or more GPUs including measuring a hit rate for the L1 cache; upon determining that the hit rate for the L1 cache is equal to or greater than a threshold value, limiting a prefetch of data to storage in the L3 cache, and upon determining that the hit rate for the L1 cache is less than a threshold value, allowing the prefetch of data to the L1 cache.
Abstract:
Methods and apparatus relating to predictive page fault handling. In an example, an apparatus comprises a processor to receive a virtual address that triggered a page fault for a compute process, check a virtual memory space for a virtual memory allocation for the compute process that triggered the page fault and manage the page fault according to one of a first protocol in response to a determination that the virtual address that triggered the page fault is a last page in the virtual memory allocation for the compute process, or a second protocol in response to a determination that the virtual address that triggered the page fault is not a last page in the virtual memory allocation for the compute process. Other embodiments are also disclosed and claimed.
Abstract:
Examples are disclosed for adjusting a performance state of a graphics subsystem and/or a processor based on a comparison of an average frame rate to a target frame rate and also based on whether the graphics subsystem is in a burst mode or sustained mode of operation.