Abstract:
In accordance with some embodiments, a command streamer may use a cache of programmable size to cache commands to improve memory bandwidth and reduce latency. The size of the command cache may be programmably set by the command streamer.
Abstract:
The same set of render commands can be re-executed for each of a plurality of tiles making up a graphic scene to be rendered. Each time the list of commands is executed, the way the commands are executed may be modified based on information received from tile pre-processing. Specifically, a jump if command may be inserted into the command list. When this command is encountered, a determination is made, based on information received from tile pre-processing pipeline, whether to execute the command for the next primitive or not. If the next primitive is to be culled then the command for the next primitive is not executed and the flow moves past that command. If the next primitive is to be executed then the jump is not implemented. This enables avoiding reloading the same list of commands over and over for every tile.
Abstract:
A mechanism is described for facilitating dynamic runtime transformation of graphics processing commands for improved graphics performance on computing devices. A method of embodiments, as described herein, includes detecting a command stream associated with an application, where the command stream includes dispatches. The method may further include evaluating processing parameters relating to each of the dispatches, where evaluating further includes associating a first plan with one or more of the dispatches to transform the command stream into a transformed command stream. The method may further include associating, based on the first plan, a second plan to the one or more of the dispatches, where the second plan represents the transformed command stream. The method may further include executing the second plan, where execution of the second plan includes processing the transformed command stream in lieu of the command stream.
Abstract:
Embodiments described herein provide a technique to enable access to entries in a surface state or sampler state using 64-bit virtual addresses. One embodiment provides a graphics core that includes memory access circuitry configured to facilitate access to the memory by functional units of the graphics core. The memory access circuitry is configured to receive a message to access an entry in a surface state or a sampler state associated with a parallel processing operation. The message specifies a base address for a surface state entry or sampler state entry. The circuitry can add the base address and the offset to determine a 64-bit virtual address for the entry in the surface state entry or the sampler state and submit a memory access request to the memory to access the entry of the surface state or sampler state.
Abstract:
Graphics processors for implementing multi-tile memory management are disclosed. In one embodiment, a graphics processor includes a first graphics device having a local memory, a second graphics device having a local memory, and a graphics driver to provide a single virtual allocation with a common virtual address range to mirror a resource to each local memory of the first and second graphics devices.
Abstract:
Described herein is a cloud-based gaming system in which multiple views of a spectated E-sports event can be rendered and combined into an immersive video having at least three degrees of freedom. Low-latency generation of the immersive video is facilitated via the use of GPU-controlled non-volatile memory on which rendered data for multiple viewpoints are stored.
Abstract:
An embodiment of a graphics command coordinator apparatus may include a commonality identifier to identify a commonality between a first graphics command corresponding to a first frame and a second graphics command corresponding to a second frame, a commonality analyzer communicatively coupled to the commonality identifier to determine if the first graphics command and the second graphics command can be processed together based on the commonality identified by the commonality identifier, and a commonality indicator communicatively coupled to the commonality analyzer to provide an indication that the first graphics command and the second graphics command are to be processed together. Other embodiments are disclosed and claimed.
Abstract:
Graphics processors for implementing multi-tile memory management are disclosed. In one embodiment, a graphics processor includes a first graphics device having a local memory, a second graphics device having a local memory, and a graphics driver to provide a single virtual allocation with a common virtual address range to mirror a resource to each local memory of the first and second graphics devices.
Abstract:
Embodiments described herein provide a graphics, media, and compute device having a tiled architecture composed of a number of tiles of smaller graphics devices. The work distribution infrastructure for such device enables the distribution of workloads across multiple tiles of the device. Work items can be submitted to any one or more of the multiple tiles, with workloads able to span multiple tiles. Additionally, upon completion of a work item, graphics, media, and/or compute engines within the device can readily acquire new work items for execution with minimal latency.
Abstract:
Embodiments described herein provide a graphics, media, and compute device having a tiled architecture composed of a number of tiles of smaller graphics devices. The work distribution infrastructure for such device enables the distribution of workloads across multiple tiles of the device. Work items can be submitted to any one or more of the multiple tiles, with workloads able to span multiple tiles. Additionally, upon completion of a work item, graphics, media, and/or compute engines within the device can readily acquire new work items for execution with minimal latency.