Abstract:
A mechanism is described for facilitating dynamic runtime transformation of graphics processing commands for improved graphics performance on computing devices. A method of embodiments, as described herein, includes detecting a command stream associated with an application, where the command stream includes dispatches. The method may further include evaluating processing parameters relating to each of the dispatches, where evaluating further includes associating a first plan with one or more of the dispatches to transform the command stream into a transformed command stream. The method may further include associating, based on the first plan, a second plan to the one or more of the dispatches, where the second plan represents the transformed command stream. The method may further include executing the second plan, where execution of the second plan includes processing the transformed command stream in lieu of the command stream.
Abstract:
Embodiments described herein provide techniques to maintain high temporal cache locality between independent threads having the same or similar memory access pattern. One embodiment provides a graphics processing unit comprising an instruction execution pipeline including hardware execution logic and a thread dispatcher to process a set of commands for execution and distribute multiple groups of hardware threads to the hardware execution logic to execute the set of commands. The thread dispatcher can be configured to concurrently distribute a first group of the multiple groups of hardware threads to the hardware execution logic and withhold distribution of additional hardware threads for the set of commands until after the first group completes execution.
Abstract:
One embodiment provides for a memory system comprising a cache memory and a cache control circuit to receive a request to perform a partial cache line write to a first cache line of the cache memory, merge the request to perform the partial cache line write with a pending request to write to the first cache line, and process a merged request as a full cache line write.
Abstract:
A mechanism is described for facilitating dynamic runtime transformation of graphics processing commands for improved graphics performance on computing devices. A method of embodiments, as described herein, includes detecting a command stream associated with an application, where the command stream includes dispatches. The method may further include evaluating processing parameters relating to each of the dispatches, where evaluating further includes associating a first plan with one or more of the dispatches to transform the command stream into a transformed command stream. The method may further include associating, based on the first plan, a second plan to the one or more of the dispatches, where the second plan represents the transformed command stream. The method may further include executing the second plan, where execution of the second plan includes processing the transformed command stream in lieu of the command stream.