Abstract:
Embodiments are generally directed to graphics processor data access and sharing. An embodiment of an apparatus includes a circuit element to produce a result in processing of an application; a load-store unit to receive the result and generate pre-fetch information for a cache utilizing the result; and a prefetch generator to produce prefetch addresses based at least in part on the pre-fetch information; wherein the load-store unit is to receive software assistance for prefetching, and wherein generation of the pre-fetch information is based at least in part on the software assistance.
Abstract:
Embodiments described herein provide techniques to enable the dynamic reconfiguration of memory on a general-purpose graphics processing unit. One embodiment described herein enables dynamic reconfiguration of cache memory bank assignments based on hardware statistics. One embodiment enables for virtual memory address translation using mixed four kilobyte and sixty-four kilobyte pages within the same page table hierarchy and under the same page directory. One embodiment provides for a graphics processor and associated heterogenous processing system having near and far regions of the same level of a cache hierarchy.
Abstract:
Embodiments are generally directed to multi-tile graphics processor rendering. An embodiment of an apparatus includes a memory for storage of data; and one or more processors including a graphics processing unit (GPU) to process data, wherein the GPU includes a plurality of GPU tiles, wherein, upon geometric data being assigned to each of a plurality of screen tiles, the apparatus is to transfer the geometric data to the plurality of GPU tiles.
Abstract:
A graphics processing apparatus and method are described. For example, one embodiment of a graphics processing apparatus comprises: an input assembler of a graphics pipeline to determine a first set of triangles to be drawn based on application-provided parameters; a depth buffer to store depth data related to the first set of triangles; a vertex shader to perform position-only vertex shading operations on the first set of triangles in response to an indication that the graphics pipeline is to initially operate in a depth-only mode; a culling and clipping module to read depth values from the depth buffer to identify those triangles in the first set of triangles which are fully occluded by other objects in a current frame and to generate culling data usable to cull occluded triangles, the culling and clipping module to associate the culling data with a replay token to be used to identify a subsequent rendering pass through the graphics pipeline; the input assembler, upon detecting the replay token in the subsequent rendering pass, to access the culling data associated therewith to remove culled triangles from the first set of triangles to generate a second set of triangles; the vertex shader to perform full vertex shading operations on the second set of triangles during the subsequent rendering pass, the replay token to be destroyed during or following the subsequent rendering pass.
Abstract:
Embodiments provide for a graphics processing apparatus including a cache memory and logic coupled to the cache memory to compress color data output from the first cache memory. In one embodiment the cache memory is a render cache. In one embodiment the cache memory is a victim data cache. In one embodiment the first cache memory is a render cache coupled to a victim data cache and logic is configured to compress color data evicted from the render cache and the victim data cache. The compression can include a target compression ratio to which the data is to be compressed.
Abstract:
An apparatus and method are described for performing efficient depth test operations. For example, an apparatus in accordance with one embodiment comprises: a depth cache to store a plurality of cache lines containing depth data to be used for graphics processing operations; depth test logic to determine a current depth test function associated with a read operation and to read a cache line from a depth cache while there are still outstanding writes to the cache line if the read operation and write operation are associated with the same depth test function, the depth test logic to perform a first depth test using the data read from the cache line, the first depth test to fail or pass pixels based on a predicted range of depth values.
Abstract:
Systems and methods may provide for receiving a plurality of signals from a software module associated with a shared resource such as, for example, an unordered access view (UAV). The plurality of signals may include a first signal that indicates whether a draw call accesses the shared resource, a second signal that indicates whether a boundary of the draw call has been reached, and a third signal that indicates whether the draw call has a coherency requirement. Additionally, a workload corresponding to the draw call may be selectively dispatched in a shader invocation based on the plurality of signals.
Abstract:
Methods, systems and apparatuses may provide for technology that identifies first graphics data that is associated with spatially proximate positions. The technology identifies second graphics data that is associated with spatially proximate positions, and interleaves the first and the second graphics data across a plurality of storage tiles.
Abstract:
Embodiments are generally directed to a multi-tile architecture for graphics operations. An embodiment of an apparatus includes a multi-tile architecture for graphics operations including a multi-tile graphics processor, the multi-tile processor includes one or more dies; multiple processor tiles installed on the one or more dies; and a structure to interconnect the processor tiles on the one or more dies, wherein the structure to enable communications between processor tiles the processor tiles.
Abstract:
Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache coupled to the processing resources. The cache controller is configured to control cache priority by determining whether default settings or an instruction will control cache operations for the cache.