Apparatus and method for efficient prefix sum operation

    公开(公告)号:US09632979B2

    公开(公告)日:2017-04-25

    申请号:US14727826

    申请日:2015-06-01

    Abstract: An apparatus and method are described for performing a prefix sum. For example, one embodiment of an apparatus comprises: a graphics processor unit comprising one or more execution units to execute single instruction multiple data (SIMD) instructions, the GPU to be provided with a plurality of data elements as input for a prefix sum operation; a first register of the GPU to store the plurality of data elements in specified data element positions; and the one or more execution units to perform a series of single instruction multiple data (SIMD) operations using the plurality of data elements, the SIMD operations performed using regioning techniques to generate the prefix sum, the SIMD operations including a first plurality of simultaneous addition operations to add specified data elements to generate intermediate results and further including a second plurality of simultaneous addition operations to add the intermediate results to other intermediate results to generate the prefix sum.

    Hardware instruction set to replace a plurality of atomic operations with a single atomic operation

    公开(公告)号:US10318292B2

    公开(公告)日:2019-06-11

    申请号:US14543027

    申请日:2014-11-17

    Abstract: Systems and methods may process a single atomic operation. An instruction set may be generated to replace a plurality of atomic operations with a single atomic operation. The instruction set may include an accumulation instruction to compute a prefix sum for a plurality of initial values associated with a plurality of processing lanes to generate a plurality of accumulated values. The instruction set may also include a broadcast instruction to return a pre-existing value to be added with each of the plurality of accumulated values to generate a plurality of intermediate accumulated values. In one example, a graphics processor may execute the instruction set to process the single atomic operation.

    FACILITATING DYNAMIC RUNTIME TRANSFORMATION OF GRAPHICS PROCESSING COMMANDS FOR IMPROVED GRAPHICS PERFORMANCE AT COMPUTING DEVICES
    4.
    发明申请
    FACILITATING DYNAMIC RUNTIME TRANSFORMATION OF GRAPHICS PROCESSING COMMANDS FOR IMPROVED GRAPHICS PERFORMANCE AT COMPUTING DEVICES 审中-公开
    促进图形处理命令的动态运行转换改进计算设备的图形性能

    公开(公告)号:US20160364828A1

    公开(公告)日:2016-12-15

    申请号:US14738679

    申请日:2015-06-12

    Abstract: A mechanism is described for facilitating dynamic runtime transformation of graphics processing commands for improved graphics performance on computing devices. A method of embodiments, as described herein, includes detecting a command stream associated with an application, where the command stream includes dispatches. The method may further include evaluating processing parameters relating to each of the dispatches, where evaluating further includes associating a first plan with one or more of the dispatches to transform the command stream into a transformed command stream. The method may further include associating, based on the first plan, a second plan to the one or more of the dispatches, where the second plan represents the transformed command stream. The method may further include executing the second plan, where execution of the second plan includes processing the transformed command stream in lieu of the command stream.

    Abstract translation: 描述了一种机制,用于促进图形处理命令的动态运行时转换,以改善计算设备上的图形性能。 如本文所述的实施例的方法包括检测与应用相关联的命令流,其中命令流包括分派。 该方法还可以包括评估与每个调度有关的处理参数,其中评估进一步包括将第一计划与一个或多个调度相关联,以将命令流变换成变换的命令流。 该方法可以进一步包括:基于第一计划,将第二计划与一个或多个调度相关联,其中第二计划表示变换的命令流。 该方法还可以包括执行第二计划,其中第二计划的执行包括处理变换的命令流来代替命令流。

Patent Agency Ranking