Patent search ap:("INTEL CORPORATION") AND inv:"Thomas F. Raoux" Page 1

1.

发明授权
Apparatus and method for efficient prefix sum operation 有权

公开(公告)号：US09632979B2

公开(公告)日：2017-04-25

申请号：US14727826

申请日：2015-06-01

Applicant: Intel Corporation

Inventor： Satyajit Sarangi , Thomas F. Raoux

IPC: G06F15/80 , G06F9/30 , G06T1/20

CPC classification number: G06F15/8007 , G06F9/3001 , G06F9/30036 , G06F9/30101 , G06F9/3851 , G06T1/20 , G06T2200/28

Abstract: An apparatus and method are described for performing a prefix sum. For example, one embodiment of an apparatus comprises: a graphics processor unit comprising one or more execution units to execute single instruction multiple data (SIMD) instructions, the GPU to be provided with a plurality of data elements as input for a prefix sum operation; a first register of the GPU to store the plurality of data elements in specified data element positions; and the one or more execution units to perform a series of single instruction multiple data (SIMD) operations using the plurality of data elements, the SIMD operations performed using regioning techniques to generate the prefix sum, the SIMD operations including a first plurality of simultaneous addition operations to add specified data elements to generate intermediate results and further including a second plurality of simultaneous addition operations to add the intermediate results to other intermediate results to generate the prefix sum.

2.

发明授权
Method and apparatus for subdividing shader workloads in a graphics processor for efficient machine configuration 有权

公开(公告)号：US10360717B1

公开(公告)日：2019-07-23

申请号：US15858396

申请日：2017-12-29

Applicant: Intel Corporation

Inventor： John G. Gierach , Travis Schluessler , Thomas F. Raoux , Peng Guo

IPC: G06F9/38 , G06T1/20 , G06T15/00

Abstract: An apparatus and method for splitting shaders. For example, one embodiment of a method comprises: receiving a request for compilation of a shader in a graphics processing environment; determining whether there is sufficient work associated with the shader to justify splitting the shader into two or more blocks of program code; evaluating the program code of the shader to identify dependencies between the blocks of program code if there is sufficient work; subdividing the shader into the two or more blocks in accordance with the identified dependencies; and individually executing the two or more blocks of code on a graphics processor. In addition, one embodiment includes the operations of determining whether any of the regions that can be subdivided are likely to run faster with different machine configurations than if the shader is executed without being subdivided, and subdividing the shader only for those regions that are likely to run faster with different machine configurations.

3.

发明授权
Hardware instruction set to replace a plurality of atomic operations with a single atomic operation 有权

公开(公告)号：US10318292B2

公开(公告)日：2019-06-11

申请号：US14543027

申请日：2014-11-17

Applicant: Intel Corporation

Inventor： Satyajit Sarangi , Thomas F. Raoux , Guei-Yuan Lueh , Subramaniam Maiyuran

IPC: G06F9/30

Abstract: Systems and methods may process a single atomic operation. An instruction set may be generated to replace a plurality of atomic operations with a single atomic operation. The instruction set may include an accumulation instruction to compute a prefix sum for a plurality of initial values associated with a plurality of processing lanes to generate a plurality of accumulated values. The instruction set may also include a broadcast instruction to return a pre-existing value to be added with each of the plurality of accumulated values to generate a plurality of intermediate accumulated values. In one example, a graphics processor may execute the instruction set to process the single atomic operation.

4.

发明申请
FACILITATING DYNAMIC RUNTIME TRANSFORMATION OF GRAPHICS PROCESSING COMMANDS FOR IMPROVED GRAPHICS PERFORMANCE AT COMPUTING DEVICES 审中-公开
Title translation: 促进图形处理命令的动态运行转换改进计算设备的图形性能

公开(公告)号：US20160364828A1

公开(公告)日：2016-12-15

申请号：US14738679

申请日：2015-06-12

Applicant: INTEL CORPORATION

Inventor： James A. Valerio , Abhishek Venkatesh , Satyajit Sarangi , Michael Apodaca , Thomas F. Raoux , Hashem Hashemi , Rama S.B. Harihara

IPC: G06T1/20

Abstract: A mechanism is described for facilitating dynamic runtime transformation of graphics processing commands for improved graphics performance on computing devices. A method of embodiments, as described herein, includes detecting a command stream associated with an application, where the command stream includes dispatches. The method may further include evaluating processing parameters relating to each of the dispatches, where evaluating further includes associating a first plan with one or more of the dispatches to transform the command stream into a transformed command stream. The method may further include associating, based on the first plan, a second plan to the one or more of the dispatches, where the second plan represents the transformed command stream. The method may further include executing the second plan, where execution of the second plan includes processing the transformed command stream in lieu of the command stream.

Abstract translation: 描述了一种机制，用于促进图形处理命令的动态运行时转换，以改善计算设备上的图形性能。如本文所述的实施例的方法包括检测与应用相关联的命令流，其中命令流包括分派。该方法还可以包括评估与每个调度有关的处理参数，其中评估进一步包括将第一计划与一个或多个调度相关联，以将命令流变换成变换的命令流。该方法可以进一步包括：基于第一计划，将第二计划与一个或多个调度相关联，其中第二计划表示变换的命令流。该方法还可以包括执行第二计划，其中第二计划的执行包括处理变换的命令流来代替命令流。

5.

发明授权
Techniques to manage execution of divergent shaders 有权

公开(公告)号：US11776195B2

公开(公告)日：2023-10-03

申请号：US17463320

申请日：2021-08-31

Applicant: Intel Corporation

Inventor： John G. Gierach , Karthik Vaidyanathan , Thomas F. Raoux

IPC: G06T15/00 , G06F9/48

CPC classification number: G06T15/005 , G06F9/4887

Abstract: Examples are described here that can be used to enable a main routine to request subroutines or other related code to be executed with other instantiations of the same subroutine or other related code for parallel execution. A sorting unit can be used to accumulate requests to execute instantiations of the subroutine. The sorting unit can request execution of a number of multiple instantiations of the subroutine corresponding to a number of lanes in a SIMD unit. A call stack can be used to share information to be accessed by a main routine after execution of the subroutine completes.

6.

发明授权
Techniques to manage execution of divergent shaders 有权

公开(公告)号：US11107263B2

公开(公告)日：2021-08-31

申请号：US16190021

申请日：2018-11-13

Applicant: Intel Corporation

Inventor： John G. Gierach , Karthik Vaidyanathan , Thomas F. Raoux

IPC: G06T15/00 , G06F9/48

Abstract: Examples are described here that can be used to enable a main routine to request subroutines or other related code to be executed with other instantiations of the same subroutine or other related code for parallel execution. A sorting unit can be used to accumulate requests to execute instantiations of the subroutine. The sorting unit can request execution of a number of multiple instantiations of the subroutine corresponding to a number of lanes in a SIMD unit. A call stack can be used to share information to be accessed by a main routine after execution of the subroutine completes.

7.

发明授权
Facilitating dynamic runtime transformation of graphics processing commands for improved graphics performance at computing devices 有权

公开(公告)号：US10796397B2

公开(公告)日：2020-10-06

申请号：US14738679

申请日：2015-06-12

Applicant: INTEL CORPORATION

Inventor： James A. Valerio , Abhishek Venkatesh , Satyajit Sarangi , Michael Apodaca , Thomas F. Raoux , Hashem Hashemi , Rama S. B. Harihara

IPC: G06T1/20 , G06F9/30

Abstract: A mechanism is described for facilitating dynamic runtime transformation of graphics processing commands for improved graphics performance on computing devices. A method of embodiments, as described herein, includes detecting a command stream associated with an application, where the command stream includes dispatches. The method may further include evaluating processing parameters relating to each of the dispatches, where evaluating further includes associating a first plan with one or more of the dispatches to transform the command stream into a transformed command stream. The method may further include associating, based on the first plan, a second plan to the one or more of the dispatches, where the second plan represents the transformed command stream. The method may further include executing the second plan, where execution of the second plan includes processing the transformed command stream in lieu of the command stream.

8.

发明授权
Method and apparatus for efficient processing of derived uniform values in a graphics processor 有权

公开(公告)号：US10726605B2

公开(公告)日：2020-07-28

申请号：US15705530

申请日：2017-09-15

Applicant: Intel Corporation

Inventor： Travis T. Schluessler , Aleksander Neyman , Guei-Yuan Lueh , Thomas F. Raoux , Bartosz Spitzbarth

IPC: G06T15/00 , G06F9/54 , G06T15/80 , G09G5/36

Abstract: Various embodiments enable low frequency calculation of derived uniform values. A compiler can identify one or more portions of a shader that calculate a derived value based on an input value. For example, this portion may include instructions that use constant values, or the results of prior functions that used constant values. The constant values may include hardcoded values provided by the program (e.g., immediates) and/or other constant values. This portion of the shader can be extracted by the compiler and compiled into a first program. The compiler can compile the remainder of the shader into a second program that receives the derived uniform values from the first program. By extracting the portion(s) of the program that calculates a derived value into a separate program, the derived uniform value or values can be calculated at a lower frequency than if they were calculated for each pixel.

9.

发明授权
Reducing memory latency in graphics operations 有权

公开(公告)号：US10552934B2

公开(公告)日：2020-02-04

申请号：US15201163

申请日：2016-07-01

Applicant: Intel Corporation

Inventor： Michael Apodaca , David M. Cimini , Thomas F. Raoux , Somnath Ghosh , Uddipan Mukherjee , Debraj Bose , Sthiti Deka , Yohai Gevim

IPC: G06T1/20 , G06T15/80 , G06T1/60 , G06F12/0886 , G06F12/0855 , G06F12/084 , G06F12/0831 , G06F12/0811 , G06F12/0804 , G06F9/30 , G06F12/00

Abstract: Methods and apparatus relating to reducing memory latency in graphics operations are described. In an embodiment, uniform data is transferred from a buffer to a General Register File (GRF) of a processor based at least in part on information stored in a gather table. The uniform data comprises data that is uniform across a plurality of primitives in a graphics operation. Other embodiments are also disclosed and claimed.

10.

发明授权
Specialized code paths in GPU processing 有权

公开(公告)号：US10140678B2

公开(公告)日：2018-11-27

申请号：US15089270

申请日：2016-04-01

Applicant: INTEL CORPORATION

Inventor： Saurabh Sharma , Abhishek Ventakesh , Travis T. Schluessler , Thomas F. Raoux , Rahul P. Sathe , Jon Hasselgren

IPC: G06T1/20 , G06F8/41 , G06F9/38 , G06T15/00

Abstract: Techniques to improve graphics processing unit (GPU) performance by introducing specialized code paths to process frequent common values are described. A shader compiler can determine instruction that, during operation, may output a common value and can introduce an enhanced shader instruction branch to process the common value to reduce overall computational requirements to execute the shader.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification