Abstract:
Embodiments detailed herein relate to reduction operations on a plurality of data element values. In one embodiment, a process comprises decoding circuitry to decode an instruction and execution circuitry to execute the decoded instruction. The instruction specifies a first input register containing a plurality of data element values, a first index register containing a plurality of indices, and an output register, where each index of the plurality of indices maps to one unique data element position of the first input register. The execution includes to identify data element values that are associated with one another based on the indices, perform one or more reduction operations on the associated data element values based on the identification, and store results of the one or more reduction operations in the output register.
Abstract:
Apparatus and method for context-aware compression. For example, one embodiment of an apparatus comprises: ray traversal/intersection circuitry to traverse rays through a hierarchical acceleration data structure to identify intersections between rays and primitives of a graphics scene; matrix compression circuitry/logic to compress hierarchical transformation matrices to generate compressed hierarchical transformation matrices by quantizing N-bit floating point data elements associated with child transforms of the hierarchical transformation matrices to variable-bit floating point numbers or integers comprising offsets from a parent transform of the child transform; and an instance processor to generate a plurality of instances of one or more base geometric objects in accordance with the compressed hierarchical transformation matrices.
Abstract:
Apparatus and method for a hierarchical beam tracer. For example, one embodiment of an apparatus comprises: a beam generator to generate beam data associated with a beam projected into a graphics scene; a bounding volume hierarchy (BVH) generator to generate BVH data comprising a plurality of hierarchically arranged BVH nodes; a hierarchical beam-based traversal unit to determine whether the beam intersects a current BVH node and, if so, to responsively subdivide the beam into N child beams to test against the current BVH node and/or to traverse further down the BVH hierarchy to select a new BVH node, wherein the hierarchical beam-based traversal unit is to iteratively subdivide successive intersecting child beams and/or to continue to traverse down the BVH hierarchy until a leaf node is reached with which at least one final child beam is determined to intersect; the hierarchical beam-based traversal unit to generate a plurality of rays within the final child beam; and intersection hardware logic to perform intersection testing for any rays intersecting the leaf node, the intersection testing to determine intersections between the rays intersecting the leaf node and primitives bounded by the leaf node.
Abstract:
An apparatus and method are described for performing an early depth test on graphics data. For example, one embodiment of a graphics processing apparatus comprises: early depth test circuitry to perform an early depth test on blocks of pixels to determine whether all pixels in the block of pixels can be resolved by the early depth test; a plurality of execution circuits to execute pixel shading operations on the blocks of pixels; and a scheduler circuit to schedule the blocks of pixels for the pixel shading operations, the scheduler circuit to prioritize the blocks of pixels in accordance with the determination as to whether all pixels in the block of pixels can be resolved by the early depth test.
Abstract:
Techniques related to graphics rendering including techniques for compression and/or decompression of graphics data by use of pixel region bit values are described.
Abstract:
Apparatus and method for a hierarchical beam tracer. For example, one embodiment of an apparatus comprises: a beam generator to generate beam data associated with a beam projected into a graphics scene; a bounding volume hierarchy (BVH) generator to generate BVH data comprising a plurality of hierarchically arranged BVH nodes; a hierarchical beam-based traversal unit to determine whether the beam intersects a current BVH node and, if so, to responsively subdivide the beam into N child beams to test against the current BVH node and/or to traverse further down the BVH hierarchy to select a new BVH node, wherein the hierarchical beam-based traversal unit is to iteratively subdivide successive intersecting child beams and/or to continue to traverse down the BVH hierarchy until a leaf node is reached with which at least one final child beam is determined to intersect; the hierarchical beam-based traversal unit to generate a plurality of rays within the final child beam; and intersection hardware logic to perform intersection testing for any rays intersecting the leaf node, the intersection testing to determine intersections between the rays intersecting the leaf node and primitives bounded by the leaf node.
Abstract:
Embodiments described herein provide techniques to enable the dynamic reconfiguration of memory on a general-purpose graphics processing unit. One embodiment described herein enables dynamic reconfiguration of cache memory bank assignments based on hardware statistics. One embodiment enables for virtual memory address translation using mixed four kilobyte and sixty-four kilobyte pages within the same page table hierarchy and under the same page directory. One embodiment provides for a graphics processor and associated heterogenous processing system having near and far regions of the same level of a cache hierarchy.
Abstract:
Embodiments described herein provide an apparatus comprising a plurality of processing resources including a first processing resource and a second processing resource, a memory communicatively coupled to the first processing resource and the second processing resource, and a processor to receive data dependencies for one or more tasks comprising one or more producer tasks executing on the first processing resource and one or more consumer tasks executing on the second processing resource and move a data output from one or more producer tasks executing on the first processing resource to a cache memory communicatively coupled to the second processing resource. Other embodiments may be described and claimed.
Abstract:
A cache streaming apparatus and method for machine learning. For example, one embodiment of an apparatus comprises: a plurality of compute units to perform machine learning operations; a cache subsystem comprising a hierarchy of cache levels, at least some of the cache levels shared by two or more of the plurality of compute units; and data streaming hardware logic to stream machine learning data in and out of the cache subsystem based on the machine learning operations, the data streaming hardware logic to load data into the cache subsystem from memory before the data is needed by a first portion of the machine learning operations and to ensure that results produced by the first portion of machine learning operations are maintained in the cache subsystem until used by a second portion of the machine learning operations.
Abstract:
A system includes a compression engine that stores the compression format information embedded in the compressed data. The compression format information can be included in a header that includes compression control surface (CCS) information. The system includes a shared memory to store compressed data for multiple hardware pipelines, where blocks of the compressed data have a common memory footprint and the compression header. The compression engine can compress data to store in the shared memory including generation of the header. The compression engine can decompress data read from the shared memory, including identification of the compression format from the header.