Abstract:
Methods and apparatus relating to techniques for multi-tile memory management. In an example, an apparatus comprises a cache memory, a high-bandwidth memory, a shader core communicatively coupled to the cache memory and comprising a processing element to decompress a first data element extracted from an in-memory database in the cache memory and having a first bit length to generate a second data element having a second bit length, greater than the first bit length, and an arithmetic logic unit (ALU) to compare the data element to a target value provided in a query of the in-memory database. Other embodiments are also disclosed and claimed.
Abstract:
Systems, apparatuses and methods may provide for encryption based technology. Data may be encrypted locally with a graphics processor with encryption engines. The graphics processor components may be verified with a root-of-trust and based on collection of claims. The graphics processor may further be able to modify encrypted data from a non-pageable format to a pageable format. The graphics processor may further process data associated with a virtual machine based on a key that is known by the virtual machine and the graphics processor.
Abstract:
Embodiments are generally directed to graphics processor data access and sharing. An embodiment of an apparatus includes a circuit element to produce a result in processing of an application; a load-store unit to receive the result and generate pre-fetch information for a cache utilizing the result; and a prefetch generator to produce prefetch addresses based at least in part on the pre-fetch information; wherein the load-store unit is to receive software assistance for prefetching, and wherein generation of the pre-fetch information is based at least in part on the software assistance.
Abstract:
Described herein are several embodiments which provide for enhanced data caching in combination with adaptive and dynamic compression to increase the storage efficiency and reduce the transmission bandwidth of data during input and output from a GPU. The techniques described herein can reduce the need to access off-chip memory, resulting in improved performance and reduced power for GPU operations. One embodiment provides for a graphics processing apparatus comprising a shader engine; one or more cache memories; cache control logic to control at least one of the one or more cache memories; and a codec unit coupled with the one or more cache memories, the codec unit configurable to perform lossless compression of read-only surface data upon storage to or eviction from the one or more cache memories.
Abstract:
In many cases, processors may change frequency sufficiently often to result in significant performance and power consumption losses. These performance and power consumption losses may be mitigated by changing the frequency using a squashing technique rather than using a phase locked loop technique. The squashing technique involves simply eliminated clock pulses to reduce the frequency. This can be done more quickly, resulting in less overhead in some cases.
Abstract:
A processor may operate at a first frequency level for a first time interval. The processor automatically may transition to a sleep state from the first frequency level after the first time interval. Then the processor automatically transitions from the sleep state to the first frequency level after a second time interval. As a result the processor may operate at a reduced power consumption and higher performance.
Abstract:
An apparatus and method are described for implementing memory management in a graphics processing system. For example, one embodiment of an apparatus comprises: a first plurality of graphics processing resources to execute graphics commands and process graphics data; a first memory management unit (MMU) to communicatively couple the first plurality of graphics processing resources to a system-level MMU to access a system memory; a second plurality of graphics processing resources to execute graphics commands and process graphics data; a second MMU to communicatively couple the second plurality of graphics processing resources to the first MMU; wherein the first MMU is configured as a master MMU having a direct connection to the system-level MMU and the second MMU comprises a slave MMU configured to send memory transactions to the first MMU, the first MMU either servicing a memory transaction or sending the memory transaction to the system-level MMU on behalf of the second MMU.
Abstract:
One embodiment provides for a parallel processor comprising a processing array within the parallel processor, the processing array including multiple compute blocks, each compute block including multiple processing clusters configured for parallel operation, wherein each of the multiple compute blocks is independently preemptable. In one embodiment a preemption hint can be generated for source code during compilation to enable a compute unit to determine an efficient point for preemption.
Abstract:
One embodiment provides for a general-purpose graphics processing device comprising a general-purpose graphics processing compute block to process a workload including graphics or compute operations, a first cache memory, and a coherency module enable the first cache memory to coherently cache data for the workload, the data stored in memory within a virtual address space, wherein the virtual address space shared with a separate general-purpose processor including a second cache memory that is coherent with the first cache memory.
Abstract:
In an embodiment, an apparatus includes: a first downstream port to couple to a first peer device; a second downstream port to couple to a second peer device; and a peer-to-peer (PTP) circuit to receive a memory access request from the first peer device, the memory access request having a target associated with the second peer device, where the PTP circuit is to convert the memory access request from a coherent protocol to a memory protocol and send the converted memory access request to the second peer device. Other embodiments are described and claimed.