-
公开(公告)号:US11163578B2
公开(公告)日:2021-11-02
申请号:US15903549
申请日:2018-02-23
Applicant: Intel Corporation
Inventor: Buqi Cheng , Wei-Yu Chen , Guei-Yuan Lueh , Chandra Gurram , Subramaniam Maiyuran
Abstract: Mechanisms for reducing register bank conflicts based on software hint and hardware thread switch are disclosed. In some embodiments, an apparatus for thread switching includes a graphics processing unit (GPU) that includes a plurality of register banks to store operands that are assigned at least partially to avoid register bank conflicts. Decoding circuitry checks a thread switching field of a first instruction to be executed by a first thread. The GPU performs a thread switch mechanism to cause a second instruction to be executed by a second thread when the thread switching field of the first instruction is set.
-
公开(公告)号:US20250037347A1
公开(公告)日:2025-01-30
申请号:US18358297
申请日:2023-07-25
Applicant: Intel Corporation
Inventor: Jiasheng Chen , Supratim Pal , Kevin Hurd , Jorge E. Parra Osorio , Christopher Spencer , Takashi Nakagawa , Guei-Yuan Lueh , Pradeep K. Golconda , James Valerio , Mukundan Swaminathan , Nicholas Murphy , Clifford Gibson , Li-An Tang , Fangwen Fu , Kaiyu Chen , Buqi Cheng
Abstract: Described herein is a graphics processor comprising an instruction cache and a plurality of processing elements coupled with the instruction cache. The plurality of processing elements include functional units configured to provide an integer pipeline to execute instructions to perform operations on integer data elements. The integer pipeline including a first multiplier and a second multiplier, the first multiplier and the second multiplier configured to execute operations for a single instruction.
-
公开(公告)号:US10754651B2
公开(公告)日:2020-08-25
申请号:US16023713
申请日:2018-06-29
Applicant: Intel Corporation
Inventor: Chandra Gurram , Subramaniam Maiyuran , Buqi Cheng , Ashutosh Garg , Guei-Yuan Lueh , Wei-Yu Chen
Abstract: Embodiments are generally directed to register bank conflict reduction for multi-threaded processor execution units. An embodiment of an apparatus includes a processor including one or more execution units (EUs), at least a first execution unit (EU) to process a plurality of threads, the first EU including a register file including multiple register banks with each register bank including multiple registers, and one or more read multiplexers to read registers from the register file, wherein attempting to read more than one register from a single register bank of the register file in a same clock cycle generates a register bank conflict. Registers for each thread for the first EU are distributed across the registers banks within the register file such that a first register for a first thread of the plurality of threads and a following second register for the first thread are located in different register banks within the register file.
-
公开(公告)号:US10692170B2
公开(公告)日:2020-06-23
申请号:US16437961
申请日:2019-06-11
Applicant: Intel Corporation
Inventor: Subramaniam Maiyuran , Supratim Pal , Jorge E. Parra , Chandra S. Gurram , Ashwin J. Shivani , Ashutosh Garg , Brent A. Schwartz , Jorge F. Garcia Pabon , Darin M. Starkey , Shubh B. Shah , Guei-Yuan Lueh , Kaiyu Chen , Konrad Trifunovic , Buqi Cheng , Weiyu Chen
Abstract: Embodiments described herein provide a graphics processor in which dependency tracking hardware is simplified via the use of compiler provided software scoreboard information. In one embodiment the shader compiler for shader programs is configured to encode software scoreboard information into each instruction. Dependencies can be evaluated by the shader compiler and provided as scoreboard information with each instruction. The hardware can then use the provided information when scheduling instructions. In one embodiment, a software scoreboard synchronization instruction is provided to facilitate software dependency handling within a shader program. Using software to facilitate software dependency handling and synchronization can simplify hardware design, reducing the area consumed by the hardware. In one embodiment, dependencies can be evaluated by the shader compiler instead of the GPU hardware. The compiler can then insert a software scoreboard sync immediate instruction into compiled program code to manage instruction dependencies and prevent data hazards from occurring.
-
公开(公告)号:US20200004534A1
公开(公告)日:2020-01-02
申请号:US16023713
申请日:2018-06-29
Applicant: Intel Corporation
Inventor: Chandra Gurram , Subramaniam Maiyuran , Buqi Cheng , Ashutosh Garg , Guei-Yuan Lueh , Wei-Yu Chen
Abstract: Embodiments are generally directed to register bank conflict reduction for multi-threaded processor execution units. An embodiment of an apparatus includes a processor including one or more execution units (EUs), at least a first execution unit (EU) to process a plurality of threads, the first EU including a register file including multiple register banks with each register bank including multiple registers, and one or more read multiplexers to read registers from the register file, wherein attempting to read more than one register from a single register bank of the register file in a same clock cycle generates a register bank conflict. Registers for each thread for the first EU are distributed across the registers banks within the register file such that a first register for a first thread of the plurality of threads and a following second register for the first thread are located in different register banks within the register file.
-
公开(公告)号:US20190362460A1
公开(公告)日:2019-11-28
申请号:US16437961
申请日:2019-06-11
Applicant: Intel Corporation
Inventor: Subramaniam Maiyuran , Supratim Pal , Jorge E. Parra , Chandra S. Gurram , Ashwin J. Shivani , Ashutosh Garg , Brent A. Schwartz , Jorge F. Garcia Pabon , Darin M. Starkey , Shubh B. Shah , Guei-Yuan Lueh , Kaiyu Chen , Konrad Trifunovic , Buqi Cheng , Weiyu Chen
Abstract: Embodiments described herein provide a graphics processor in which dependency tracking hardware is simplified via the use of compiler provided software scoreboard information. In one embodiment the shader compiler for shader programs is configured to encode software scoreboard information into each instruction. Dependencies can be evaluated by the shader compiler and provided as scoreboard information with each instruction. The hardware can then use the provided information when scheduling instructions. In one embodiment, a software scoreboard synchronization instruction is provided to facilitate software dependency handling within a shader program. Using software to facilitate software dependency handling and synchronization can simplify hardware design, reducing the area consumed by the hardware. In one embodiment, dependencies can be evaluated by the shader compiler instead of the GPU hardware. The compiler can then insert a software scoreboard sync immediate instruction into compiled program code to manage instruction dependencies and prevent data hazards from occurring.
-
公开(公告)号:US20250036412A1
公开(公告)日:2025-01-30
申请号:US18358308
申请日:2023-07-25
Applicant: Intel Corporation
Inventor: Supratim Pal , Jiasheng Chen , Christopher Spencer , Jorge E. Parra Osorio , Kevin Hurd , Guei-Yuan Lueh , Pradeep K. Golconda , Fangwen Fu , Wei Xiong , Hongzheng Li , James Valerio , Mukundan Swaminathan , Nicholas Murphy , Shuai Mu , Clifford Gibson , Buqi Cheng
Abstract: Described herein is a graphics processor comprising a memory interface and a graphics processing cluster coupled with the memory interface. The graphics processing cluster includes a plurality of processing resources. A processing resource of the plurality of processing resources includes a source crossbar communicatively coupled with a register file, the source crossbar to reorder data elements of a source operand and a format conversion pipeline to convert a plurality of input data elements specified by the source operand from a first format of a plurality of datatype formats to a second format of the plurality of datatype formats, the plurality of datatype formats including integer and floating-point formats.
-
公开(公告)号:US20250036361A1
公开(公告)日:2025-01-30
申请号:US18358304
申请日:2023-07-25
Applicant: Intel Corporation
Inventor: Supratim Pal , Jiasheng Chen , Kevin Hurd , Jorge E. Parra Osorio , Christopher Spencer , Guei-Yuan Lueh , Pradeep K. Golconda , Fangwen Fu , Wei Xiong , Hongzheng Li , James Valerio , Mukundan Swaminathan , Nicholas Murphy , Shuai Mu , Clifford Gibson , Buqi Cheng
IPC: G06F7/483
Abstract: Described herein is a graphics processor comprising a memory interface and a graphics processing cluster coupled with the memory interface. The graphics processing cluster includes a multi-lane parallel floating-point unit and a multi-lane parallel integer unit. The multi-lane parallel integer unit includes an integer pipeline including a plurality of parallel integer logic units configured to perform integer compute operations on a plurality of input data elements and a format conversion pipeline including a plurality of parallel format conversion units configured to convert a plurality of input data elements from a first one of a plurality of datatype formats to a second one of the plurality of datatype formats, the plurality of datatype formats including integer and floating-point formats.
-
公开(公告)号:US10360654B1
公开(公告)日:2019-07-23
申请号:US15990328
申请日:2018-05-25
Applicant: Intel Corporation
Inventor: Subramaniam Maiyuran , Supratim Pal , Jorge E. Parra , Chandra S. Gurram , Ashwin J. Shivani , Ashutosh Garg , Brent A. Schwartz , Jorge F. Garcia Pabon , Darin M. Starkey , Shubh B. Shah , Guei-Yuan Lueh , Kaiyu Chen , Konrad Trifunovic , Buqi Cheng , Weiyu Chen
Abstract: Embodiments described herein provide a graphics processor in which dependency tracking hardware is simplified via the use of compiler provided software scoreboard information. In one embodiment the shader compiler for shader programs is configured to encode software scoreboard information into each instruction. Dependencies can be evaluated by the shader compiler and provided as scoreboard information with each instruction. The hardware can then use the provided information when scheduling instructions. In one embodiment, a software scoreboard synchronization instruction is provided to facilitate software dependency handling within a shader program. Using software to facilitate software dependency handling and synchronization can simplify hardware design, reducing the area consumed by the hardware. In one embodiment, dependencies can be evaluated by the shader compiler instead of the GPU hardware. The compiler can then insert a software scoreboard sync immediate instruction into compiled program code to manage instruction dependencies and prevent data hazards from occurring.
-
-
-
-
-
-
-
-