Patent search ap:("Intel Corporation") AND inv:"Kaiyu Chen" Page 1

1.

发明公开
CROSS-THREAD REGISTER SHARING FOR MATRIX MULTIPLICATION COMPUTE 审中-公开

公开(公告)号：US20240168807A1

公开(公告)日：2024-05-23

申请号：US18056949

申请日：2022-11-18

Applicant: Intel Corporation

Inventor： Jorge Eduardo Parra Osorio , Guei-Yuan Lueh , Maxim Kazakov , Fangwen Fu , Supratim Pal , Kaiyu Chen

IPC: G06F9/50 , G06F9/48 , G06F9/52 , G06F15/80

CPC classification number: G06F9/5027 , G06F9/48 , G06F9/522 , G06F15/8046

Abstract: An apparatus to facilitate cross-thread register sharing for matrix multiplication compute is disclosed. The apparatus includes matrix acceleration hardware comprising a plurality of data processing units, wherein the respective plurality of data processing units are to: receive a decoded instruction for a first thread having a first register space, wherein the decoded instruction is for a matrix multiplication operation and comprises an indication to utilize a second register space of a second thread for an operand of the decoded instruction for the first thread; access the second register space of the second thread to obtain data for the operand of the decoded instruction; and perform the matrix multiplication operation for the first thread using the data for the operand from the second register space of the second thread.

2.

发明授权
Systolic array of arbitrary physical and logical depth 有权

公开(公告)号：US12174783B2

公开(公告)日：2024-12-24

申请号：US17304678

申请日：2021-06-24

Applicant: Intel Corporation

Inventor： Jorge Parra , Wei-yu Chen , Kaiyu Chen , Varghese George , Junjie Gu , Chandra Gurram , Guei-Yuan Lueh , Stephen Junkins , Subramaniam Maiyuran , Supratim Pal

IPC: G06F15/80 , G06F9/50 , G06F9/54 , G06T1/20

Abstract: A processing apparatus includes a processing resource including a general-purpose parallel processing engine and a matrix accelerator. The matrix accelerator includes first circuitry to receive a command to perform operations associated with an instruction, second circuitry to configure the matrix accelerator according to a physical depth of a systolic array within the matrix accelerator and a logical depth associated with the instruction, third circuitry to read operands for the instruction from a register file associated with the systolic array, fourth circuitry to perform operations for the instruction via one or more passes through one or more physical pipeline stages of the systolic array based on a configuration performed by the second circuitry, and fifth circuitry to write output of the operations to the register file associated with the systolic array.

3.

发明申请
GLOBAL OPTIMAL PATH DETERMINATION UTILIZING PARALLEL PROCESSING 审中-公开

公开(公告)号：US20190180406A1

公开(公告)日：2019-06-13

申请号：US15839640

申请日：2017-12-12

Applicant: Intel Corporation

Inventor： Yuenian Yang , Kaiyu Chen , Andrew Kuzma

IPC: G06T1/20 , G06F9/38 , G06T1/60 , G06T15/00

Abstract: Embodiments are generally directed to global optimal path determination utilizing parallel processing. An embodiment of an apparatus includes a central processing unit (CPU); a graphical processing unit (GPU), the GPU being capable of a plurality of processing threads; and a memory to store data for a system under evaluation, the system under evaluation including a set of nodes having a first endpoint, a second endpoint, and multiple paths between the first endpoint and the second endpoint. The apparatus is to determine a most energy efficient path between the first endpoint and the second endpoint utilizing parallel processing of a push and relabel graph cut algorithm. Performance of the push and relabel algorithm includes a plurality of process iterations, each process iteration including performance of a relabel operation, a push operation in a first direction, and a push operation in a second direction.

4.

发明申请
32-BIT CHANNEL-ALIGNED INTEGER MULTIPLICATION VIA MULTIPLE MULTIPLIERS PER-CHANNEL 有权

公开(公告)号：US20250037347A1

公开(公告)日：2025-01-30

申请号：US18358297

申请日：2023-07-25

Applicant: Intel Corporation

Inventor： Jiasheng Chen , Supratim Pal , Kevin Hurd , Jorge E. Parra Osorio , Christopher Spencer , Takashi Nakagawa , Guei-Yuan Lueh , Pradeep K. Golconda , James Valerio , Mukundan Swaminathan , Nicholas Murphy , Clifford Gibson , Li-An Tang , Fangwen Fu , Kaiyu Chen , Buqi Cheng

IPC: G06T15/00 , G06F9/30

Abstract: Described herein is a graphics processor comprising an instruction cache and a plurality of processing elements coupled with the instruction cache. The plurality of processing elements include functional units configured to provide an integer pipeline to execute instructions to perform operations on integer data elements. The integer pipeline including a first multiplier and a second multiplier, the first multiplier and the second multiplier configured to execute operations for a single instruction.

5.

发明申请
SYSTOLIC ARRAY OF ARBITRARY PHYSICAL AND LOGICAL DEPTH 有权

公开(公告)号：US20220414053A1

公开(公告)日：2022-12-29

申请号：US17304678

申请日：2021-06-24

Applicant: Intel Corporation

Inventor： Jorge Parra , Wei-yu Chen , Kaiyu Chen , Varghese George , Junjie Gu , Chandra Gurram , Guei-Yuan Lueh , Stephen Junkins , Subramaniam Maiyuran , Supratim Pal

IPC: G06F15/80 , G06F9/50 , G06F9/54 , G06T1/20

Abstract: A processing apparatus includes a processing resource including a general-purpose parallel processing engine and a matrix accelerator. The matrix accelerator includes first circuitry to receive a command to perform operations associated with an instruction, second circuitry to configure the matrix accelerator according to a physical depth of a systolic array within the matrix accelerator and a logical depth associated with the instruction, third circuitry to read operands for the instruction from a register file associated with the systolic array, fourth circuitry to perform operations for the instruction via one or more passes through one or more physical pipeline stages of the systolic array based on a configuration performed by the second circuitry, and fifth circuitry to write output of the operations to the register file associated with the systolic array.

6.

发明授权
Software scoreboard information and synchronization 有权

公开(公告)号：US10692170B2

公开(公告)日：2020-06-23

申请号：US16437961

申请日：2019-06-11

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Supratim Pal , Jorge E. Parra , Chandra S. Gurram , Ashwin J. Shivani , Ashutosh Garg , Brent A. Schwartz , Jorge F. Garcia Pabon , Darin M. Starkey , Shubh B. Shah , Guei-Yuan Lueh , Kaiyu Chen , Konrad Trifunovic , Buqi Cheng , Weiyu Chen

IPC: G06F9/38 , G06F8/41 , G06T1/20 , G06F9/30 , G06T1/60 , G09G5/36 , G06T15/00

Abstract: Embodiments described herein provide a graphics processor in which dependency tracking hardware is simplified via the use of compiler provided software scoreboard information. In one embodiment the shader compiler for shader programs is configured to encode software scoreboard information into each instruction. Dependencies can be evaluated by the shader compiler and provided as scoreboard information with each instruction. The hardware can then use the provided information when scheduling instructions. In one embodiment, a software scoreboard synchronization instruction is provided to facilitate software dependency handling within a shader program. Using software to facilitate software dependency handling and synchronization can simplify hardware design, reducing the area consumed by the hardware. In one embodiment, dependencies can be evaluated by the shader compiler instead of the GPU hardware. The compiler can then insert a software scoreboard sync immediate instruction into compiled program code to manage instruction dependencies and prevent data hazards from occurring.

7.

发明申请
SOFTWARE SCOREBOARD INFORMATION AND SYNCHRONIZATION 审中-公开

公开(公告)号：US20190362460A1

公开(公告)日：2019-11-28

申请号：US16437961

申请日：2019-06-11

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Supratim Pal , Jorge E. Parra , Chandra S. Gurram , Ashwin J. Shivani , Ashutosh Garg , Brent A. Schwartz , Jorge F. Garcia Pabon , Darin M. Starkey , Shubh B. Shah , Guei-Yuan Lueh , Kaiyu Chen , Konrad Trifunovic , Buqi Cheng , Weiyu Chen

IPC: G06T1/20 , G06F9/30 , G06F9/38 , G06F8/41

Abstract: Embodiments described herein provide a graphics processor in which dependency tracking hardware is simplified via the use of compiler provided software scoreboard information. In one embodiment the shader compiler for shader programs is configured to encode software scoreboard information into each instruction. Dependencies can be evaluated by the shader compiler and provided as scoreboard information with each instruction. The hardware can then use the provided information when scheduling instructions. In one embodiment, a software scoreboard synchronization instruction is provided to facilitate software dependency handling within a shader program. Using software to facilitate software dependency handling and synchronization can simplify hardware design, reducing the area consumed by the hardware. In one embodiment, dependencies can be evaluated by the shader compiler instead of the GPU hardware. The compiler can then insert a software scoreboard sync immediate instruction into compiled program code to manage instruction dependencies and prevent data hazards from occurring.

8.

发明申请
GRAPHICS PROCESSOR REGISTER DATA RE-USE MECHANISM 审中-公开

公开(公告)号：US20190304056A1

公开(公告)日：2019-10-03

申请号：US15938078

申请日：2018-03-28

Applicant: Intel Corporation

Inventor： Slawomir Grajewski , Kaiyu Chen , Guei-Yuan Lueh , Subramaniam Maiyuran

IPC: G06T1/60 , G06T15/00 , G06T1/20

Abstract: A processing apparatus is described. The apparatus includes a graphics processing unit (GPU), including a plurality of execution units to process graphics context data and a register file having a plurality of registers to store the graphics context data; and register renaming logic to facilitate re-use of register data by partitioning a first part and a second part, the first part to include thread-independent code and the second part to include thread-dependent code.

9.

发明申请
Divergent Control Flow for Fused EUs 审中-公开

公开(公告)号：US20170372446A1

公开(公告)日：2017-12-28

申请号：US15190663

申请日：2016-06-23

Applicant: Intel Corporation

Inventor： Pratik J. Ashar , Guei-Yuan Ken Lueh , Kaiyu Chen , Subramaniam Maiyuran , Brent A. Schwartz , Darin M. Starkey

IPC: G06T1/20 , G06F9/455 , G06F9/38

Abstract: Embodiments provide support for divergent control flow in heterogeneous compute operations on a fused execution unit. On embodiment provides for a processing apparatus comprising a fused execution unit including multiple graphics execution units having a common instruction pointer; logic to serialize divergent function calls by the fused execution unit, the logic configured to compare a call target of execution channels within the fused execution unit and create multiple groups of channels, each group of channels associated with a single call target; and wherein the fused execution unit is to execute a first group of channels via a first execution unit and a second group of channels via a second execution unit.

10.

发明申请
SYSTOLIC ARRAY OF ARBITRARY PHYSICAL AND LOGICAL DEPTH 有权

公开(公告)号：US20250117360A1

公开(公告)日：2025-04-10

申请号：US18931412

申请日：2024-10-30

Applicant: Intel Corporation

Inventor： Jorge Parra , Wei-yu Chen , Kaiyu Chen , Varghese George , Junjie Gu , Chandra Gurram , Guei-Yuan Lueh , Stephen Junkins , Subramaniam Maiyuran , Supratim Pal

IPC: G06F15/80 , G06F9/30

Abstract: A processing apparatus includes a processing resource including a general-purpose parallel processing engine and a matrix accelerator. The matrix accelerator includes first circuitry to receive a command to perform operations associated with an instruction, second circuitry to configure the matrix accelerator according to a physical depth of a systolic array within the matrix accelerator and a logical depth associated with the instruction, third circuitry to read operands for the instruction from a register file associated with the systolic array, fourth circuitry to perform operations for the instruction via one or more passes through one or more physical pipeline stages of the systolic array based on a configuration performed by the second circuitry, and fifth circuitry to write output of the operations to the register file associated with the systolic array.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification