Abstract:
Described herein are computer graphics technologies to facilitate effective and efficient memory handling for blocks of memory including texture maps. More particularly, one or more implementations described herein facilitates hierarchical lossless compression of memory with null data support for memory resources, including texture maps. More particularly still, one or more implementations described herein facilitates the use of meta-data for lossless compression and the support of null encodings for Tiled Resources. This technology also permits use of the fast-clear compression method, where meta-data specifies that the entire access should return some specified clear value.
Abstract:
Techniques related to graphics rendering including techniques for improved multi -sampling anti-aliasing compression by use of unreachable bit combinations are described. A computer-implemented method for providing anti-aliasing in graphics rendering comprising determining bit combinations for individual pixels of a tile of pixels and transforming at least one bit combination of the bit combinations to an unreachable bit combination.
Abstract:
Apparatus and method for efficient graphics processing including ray tracing. For example, one embodiment of a graphics processor comprises: execution hardware logic to execute graphics commands and render images; an interface to couple functional units of the execution hardware logic to a tiled resource; and a tiled resource manager to manage access by the functional units to the tiled resource, a functional unit of the execution hardware logic to generate a request with a hash identifier (ID) to request access to a portion of the tiled resource, wherein the tiled resource manager is to determine whether a portion of the tiled resource identified by the hash ID exists, and if not, to allocate a new portion of the tiled resource and associate the new portion with the hash ID.
Abstract:
Embodiments are generally directed to compute optimization in graphics processing. An embodiment of an apparatus includes one or more processors including a multi-tile graphics processing unit (GPU) to process data, the multi-tile GPU including multiple processor tiles; and a memory for storage of data for processing, wherein the apparatus is to receive compute work for processing by the GPU, partition the compute work into multiple work units, assign each of multiple work units to one of the processor tiles, and process the compute work using the processor tiles assigned to the work units.
Abstract:
Embodiments described herein include, software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. Embodiment described herein provided techniques to skip computational operations for zero filled matrices and sub-matrices. Embodiments additionally provide techniques to maintain data compression through to a processing unit. Embodiments additionally provide an architecture for a sparse aware logic unit.
Abstract:
An apparatus to facilitate synchronizing encrypted workloads across multiple graphics processing units is disclosed. The apparatus includes a memory and one or more processors of the plurality of GPUs, the one or more processors communicably coupled to the memory. The one or more processors are to receive a license associated with the encrypted workload, the license comprising a private content key corresponding to a secure compute application generating the encrypted workload, encrypt the private content key with a first key to generate a session key, the first key shared among graphic security controllers (GSCs) of the plurality of GPUs, and inject the session key into a region of the memory that is shared among the plurality of GPUs.
Abstract:
Apparatuses including general-purpose graphics processing units having on chip dense memory for temporal buffering are disclosed. In one embodiment, a graphics multiprocessor includes a plurality of compute engines to perform first computations to generate a first set of data, cache for storing data, and a high density memory that is integrated on chip with the plurality of compute engines and the cache. The high density memory to receive the first set of data, to temporarily store the first set of data, and to provide the first set of data to the cache during a first time period that is prior to a second time period when the plurality of compute engines will use the first set of data for second computations.
Abstract:
Methods and apparatus relating to techniques to improve/optimize latency and bandwidth efficiency for read modify write operations when a read operation is requested to a partially modified write only cacheline are described. In an embodiment, a first cache stores data from one or more cachelines of a second cache in response to a read hit write only operation (e.g., instead of sending the data to main memory). Write accumulate logic merges the stored data with one or more write operations. Other embodiments are also disclosed and claimed.
Abstract:
Embodiments provide for a graphics processing apparatus comprising render logic to detect rendering operations that will result in framebuffer having the same data as the initial clear color value and morphing such rendering operations to optimizations that are typically done for initial clearing of the framebuffer.
Abstract:
Techniques related to graphic rendering including lossy color merge for multi-sampling anti-aliasing compression are discussed. A method for providing anti-aliasing in graphic rendering comprising determining two or more colors associated with a plurality of color samples of a pixel; determining a first color of the two or more colors and a second color of the two or more colors are substantially similar; merging the first color and the second color to form a merged color; and replacing the first color and the second color with the merged color.