Abstract:
An apparatus and method are described for performing a prefix sum. For example, one embodiment of an apparatus comprises: a graphics processor unit comprising one or more execution units to execute single instruction multiple data (SIMD) instructions, the GPU to be provided with a plurality of data elements as input for a prefix sum operation; a first register of the GPU to store the plurality of data elements in specified data element positions; and the one or more execution units to perform a series of single instruction multiple data (SIMD) operations using the plurality of data elements, the SIMD operations performed using regioning techniques to generate the prefix sum, the SIMD operations including a first plurality of simultaneous addition operations to add specified data elements to generate intermediate results and further including a second plurality of simultaneous addition operations to add the intermediate results to other intermediate results to generate the prefix sum.
Abstract:
A mechanism is described for facilitating dynamic runtime transformation of graphics processing commands for improved graphics performance on computing devices. A method of embodiments, as described herein, includes detecting a command stream associated with an application, where the command stream includes dispatches. The method may further include evaluating processing parameters relating to each of the dispatches, where evaluating further includes associating a first plan with one or more of the dispatches to transform the command stream into a transformed command stream. The method may further include associating, based on the first plan, a second plan to the one or more of the dispatches, where the second plan represents the transformed command stream. The method may further include executing the second plan, where execution of the second plan includes processing the transformed command stream in lieu of the command stream.
Abstract:
Systems and methods may process a single atomic operation. An instruction set may be generated to replace a plurality of atomic operations with a single atomic operation. The instruction set may include an accumulation instruction to compute a prefix sum for a plurality of initial values associated with a plurality of processing lanes to generate a plurality of accumulated values. The instruction set may also include a broadcast instruction to return a pre-existing value to be added with each of the plurality of accumulated values to generate a plurality of intermediate accumulated values. In one example, a graphics processor may execute the instruction set to process the single atomic operation.
Abstract:
A mechanism is described for facilitating dynamic runtime transformation of graphics processing commands for improved graphics performance on computing devices. A method of embodiments, as described herein, includes detecting a command stream associated with an application, where the command stream includes dispatches. The method may further include evaluating processing parameters relating to each of the dispatches, where evaluating further includes associating a first plan with one or more of the dispatches to transform the command stream into a transformed command stream. The method may further include associating, based on the first plan, a second plan to the one or more of the dispatches, where the second plan represents the transformed command stream. The method may further include executing the second plan, where execution of the second plan includes processing the transformed command stream in lieu of the command stream.