Abstract:
An apparatus to facilitate thread scheduling is disclosed. The apparatus includes logic to store barrier usage data based on a magnitude of barrier messages in an application kernel and a scheduler to schedule execution of threads across a plurality of multiprocessors based on the barrier usage data.
Abstract:
Methods and apparatus relating to techniques for avoiding cache lookup for cold cache. In an example, an apparatus comprises logic, at least partially comprising hardware logic, to determine a first number of threads to be scheduled for each context of a plurality of contexts in a multi-context processing system, allocate a second number of streaming multiprocessors (SMs) to the respective plurality of contexts, and dispatch threads from the plurality of contexts only to the streaming multiprocessor(s) allocated to the respective plurality of contexts. Other embodiments are also disclosed and claimed.
Abstract:
Methods and apparatus relating to techniques for avoiding cache lookup for cold cache. In an example, an apparatus comprises logic, at least partially comprising hardware logic, to collect user information for a user of a data processing device, generate a user profile for the user of the data processing device from the user information, and set a power profile a processor in the data processing device using the user profile. Other embodiments are also disclosed and claimed.
Abstract:
Hybrid multi-level memory architecture technologies are described. A System on Chip (SOC) includes multiple functional units and a multi-level memory controller (MLMC) coupled to the functional units. The MLMC is coupled to a hybrid multi-level memory architecture including a first-level dynamic random access memory (DRAM) (near memory) that is located on-package of the SOC and a second-level DRAM (far memory) that is located off-package of the SOC. The MLMC presents the first-level DRAM and the second-level DRAM as a contiguous addressable memory space and provides the first-level DRAM to software as additional memory capacity to a memory capacity of the second-level DRAM. The first-level DRAM does not store a copy of contents of the second-level DRAM.
Abstract:
A cache controller with a pattern recognition mechanism can identify patterns in cache lines. Instead of transmitting the entire data of the cache line to a destination device, the cache controller can generate a meta signal to represent the identified bit pattern. The cache controller transmits the meta signal to the destination in place of at least part of the cache line.
Abstract:
A processing device comprises an instruction execution unit, a memory agent and pinning logic to pin memory pages in a multi-level memory system upon request by the memory agent. The pinning logic includes an agent interface module to receive, from the memory agent, a pin request indicating a first memory page in the multi-level memory system, the multi-level memory system comprising a near memory and a far memory. The pinning logic further includes a memory interface module to retrieve the first memory page from the far memory and write the first memory page to the near memory. In addition, the pinning logic also includes a descriptor table management module to mark the first memory page as pinned in the near memory, wherein marking the first memory page as pinned comprises setting a pinning bit corresponding to the first memory page in a cache descriptor table and to prevent the first memory page from being evicted from the near memory when the first memory page is marked as pinned.
Abstract:
Clock signal generation circuitry. A frequency multiplier is coupled to receive a clock signal and to generate a frequency-multiplied clock signal. A switching circuit is coupled to receive at least two reference clock signals. The switching circuit provides one of the reference clock signals in response to a reference select signal. A phase locked loop (PLL) is coupled to receive the frequency-multiplied clock signal and the selected reference clock signal. The PLL generates an output clock signal.
Abstract:
Embodiments provide mechanisms to facilitate compute operations for deep neural networks. One embodiment comprises a graphics processing unit comprising one or more multiprocessors, at least one of the one or more multiprocessors including a register file to store a plurality of different types of operands and a plurality of processing cores. The plurality of processing cores includes a first set of processing cores of a first type and a second set of processing cores of a second type. The first set of processing cores are associated with a first memory channel and the second set of processing cores are associated with a second memory channel.
Abstract:
An apparatus to facilitate barrier state save and restore for preemption in a graphics environment is disclosed. The apparatus includes processing resources to execute a plurality of execution threads that are comprised in a thread group (TG) and mid-thread preemption barrier save and restore hardware circuitry to: initiate an exception handling routine in response to a mid-thread preemption event, the exception handling routine to cause a barrier signaling event to be issued; receive indication of a valid designated thread status for a thread of a thread group (TG) in response to the barrier signaling event; and in response to receiving the indication of the valid designated thread status for the thread of the TG, cause, by the thread of the TG having the valid designated thread status, a barrier save routine and a barrier restore routine to be initiated for named barriers of the TG.
Abstract:
One embodiment provides a graphics processor comprising an interface to a system interconnect and a graphics processor coupled to the interface, the graphics processor comprising circuitry configured to compact sample data for multiple sample locations of a pixel, map the multiple sample locations to memory locations that store compacted sample data, the memory locations in a memory of the graphics processor, apply lossless compression to the compacted sample data, and update a compression control surface associated with the memory locations, the compression control surface to specify a compression status for the memory locations