Patent search ap:("Intel Corporation") AND inv:"Jorge Parra" Page 3

21.

发明申请
DUAL PIPELINE PARALLEL SYSTOLIC ARRAY 有权

公开(公告)号：US20250117359A1

公开(公告)日：2025-04-10

申请号：US18913758

申请日：2024-10-11

Applicant: Intel Corporation

Inventor： Jorge Parra , Jiasheng Chen , Supratim Pal , Fangwen Fu , Sabareesh Ganapathy , Chandra Gurram , Chunhui Mei , Yue Qi

IPC: G06F9/38 , G06F9/30

Abstract: A processing apparatus described herein includes a general-purpose parallel processing engine comprising a systolic array having multiple pipelines, each of the multiple pipelines including multiple pipeline stages, wherein the multiple pipelines include a first pipeline, a second pipeline, and a common input shared between the first pipeline and the second pipeline.

22.

发明公开
UTILIZING STRUCTURED SPARSITY IN SYSTOLIC ARRAYS 审中-公开

公开(公告)号：US20240320000A1

公开(公告)日：2024-09-26

申请号：US18621539

申请日：2024-03-29

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Jorge Parra , Ashutosh Garg , Chandra Gurram , Chunhui Mei , Durgesh Borkar , Shubra Marwaha , Supratim Pal , Varghese George , Wei Xiong , Yan Li , Yongsheng Liu , Dipankar Das , Sasikanth Avancha , Dharma Teja Vooturi , Naveen K. Mellempudi

IPC: G06F9/30 , G06F9/38 , G06F15/80

CPC classification number: G06F9/30036 , G06F9/3001 , G06F9/30101 , G06F9/3893 , G06F15/8046

Abstract: An apparatus to facilitate utilizing structured sparsity in systolic arrays is disclosed. The apparatus includes a processor comprising a systolic array to receive data from a plurality of source registers, the data comprising unpacked source data, structured source data that is packed based on sparsity, and metadata corresponding to the structured source data; identify portions of the unpacked source data to multiply with the structured source data, the portions of the unpacked source data identified based on the metadata; and output, to a destination register, a result of multiplication of the portions of the unpacked source data and the structured source data.

23.

发明授权
Graphics processors and graphics processing units having dot product accumulate instruction for hybrid floating point format 有权

公开(公告)号：US11709793B2

公开(公告)日：2023-07-25

申请号：US17827067

申请日：2022-05-27

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Shubra Marwaha , Ashutosh Garg , Supratim Pal , Jorge Parra , Chandra Gurram , Varghese George , Darin Starkey , Guei-Yuan Lueh

IPC: G06T15/06 , G06F9/30 , G06F15/78 , G06F9/38 , G06F17/18 , G06F12/0802 , G06F7/544 , G06F7/575 , G06F12/02 , G06F12/0866 , G06F12/0875 , G06F12/0895 , G06F12/128 , G06F12/06 , G06F12/1009 , G06T1/20 , G06T1/60 , H03M7/46 , G06F12/0811 , G06F15/80 , G06F17/16 , G06F7/58 , G06F12/0871 , G06F12/0862 , G06F12/0897 , G06F9/50 , G06F12/0804 , G06F12/0882 , G06F12/0891 , G06F12/0893 , G06F12/0888 , G06N3/08

CPC classification number: G06F15/7839 , G06F7/5443 , G06F7/575 , G06F7/588 , G06F9/3001 , G06F9/3004 , G06F9/30014 , G06F9/30036 , G06F9/30043 , G06F9/30047 , G06F9/30065 , G06F9/30079 , G06F9/3887 , G06F9/5011 , G06F9/5077 , G06F12/0215 , G06F12/0238 , G06F12/0246 , G06F12/0607 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/8046 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06F9/3802 , G06F9/3818 , G06F9/3867 , G06F2212/1008 , G06F2212/1021 , G06F2212/1044 , G06F2212/302 , G06F2212/401 , G06F2212/455 , G06F2212/60 , G06N3/08 , G06T15/06

Abstract: Described herein is a graphics processing unit (GPU) comprising a first processing cluster to perform parallel processing operations, the parallel processing operations including a ray tracing operation and a matrix multiply operation; and a second processing cluster coupled to the first processing cluster, wherein the first processing cluster includes a floating-point unit to perform floating point operations, the floating-point unit is configured to process an instruction using a bfloat16 (BF16) format with a multiplier to multiply second and third source operands while an accumulator adds a first source operand with output from the multiplier.

24.

发明授权
Computing efficient cross channel operations in parallel computing machines using systolic arrays 有权

公开(公告)号：US11669490B2

公开(公告)日：2023-06-06

申请号：US17518202

申请日：2021-11-03

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Jorge Parra , Supratim Pal , Chandra Gurram

IPC: G06F15/80 , G06N20/00 , G06F17/16

CPC classification number: G06F15/8046 , G06F15/8007 , G06F17/16 , G06N20/00

Abstract: An apparatus to facilitate computing efficient cross channel operations in parallel computing machines using systolic arrays is disclosed. The apparatus includes a plurality of registers and one or more processing elements communicably coupled to the plurality of registers. The one or more processing elements include a systolic array circuit to perform cross-channel operations on source data received from a single source register of the plurality of registers, wherein the systolic array circuit is modified to: receive inputs from the single source register at different stages of the systolic array circuit; perform cross-channel operations at channels of the systolic array circuit; bypass disabled channels of the systolic array circuit, the disabled channels not used to compute the cross-channel operations; and broadcast a final result of a final stage of the systolic array circuit to all channels of a destination register.

25.

发明申请
USING SPARSITY METADATA TO REDUCE SYSTOLIC ARRAY POWER CONSUMPTION 有权

公开(公告)号：US20220413924A1

公开(公告)日：2022-12-29

申请号：US17358542

申请日：2021-06-25

Applicant: Intel Corporation

Inventor： Jorge Parra , Supratim Pal , Jiasheng Chen , Chandra Gurram

IPC: G06F9/50 , G06F15/80 , G06F7/523 , G06F7/50 , G06T1/20

Abstract: A processing apparatus can include a general-purpose parallel processing engine comprising a matrix accelerator including a multi-stage systolic array, where each stage includes multiple processing elements associated with multiple processing channels. The multiple processing elements are configured to receive output sparsity metadata that is independent of input sparsity of input matrix elements and perform processing operations on the input matrix elements based on the output sparsity metadata.

26.

发明申请
SYSTOLIC ARRAY HAVING SUPPORT FOR OUTPUT SPARSITY 有权

公开(公告)号：US20220413803A1

公开(公告)日：2022-12-29

申请号：US17304803

申请日：2021-06-25

Applicant: Intel Corporation

Inventor： Jorge Parra , Fangwen Fu , Subramaniam Maiyuran , Varghese George , Mike Macpherson , Supratim Pal , Chandra Gurram , Sabareesh Ganapathy , Sasikanth Avancha , Dharma Teja Vooturi , Naveen Mellempudi , Dipankar Das

IPC: G06F7/544 , G06F7/523 , G06F15/80 , G06F17/16

Abstract: A processing apparatus is described herein that includes a general-purpose parallel processing engine comprising a matrix accelerator including one or more systolic arrays, at least one of the one or more systolic arrays comprising multiple pipeline stages, each pipeline stage of the multiple pipeline stages including multiple processing elements, the multiple processing elements configured to perform processing operations on input matrix elements based on output sparsity metadata. The output sparsity metadata indicates to the multiple processing elements to bypass multiplication for a first row of elements of a second matrix and multiply a second row of elements of the second matrix with a column of matrix elements of a first matrix.

27.

发明授权
Graphics processors and graphics processing units having dot product accumulate instruction for hybrid floating point format 有权

公开(公告)号：US11361496B2

公开(公告)日：2022-06-14

申请号：US17304092

申请日：2021-06-14

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Shubra Marwaha , Ashutosh Garg , Supratim Pal , Jorge Parra , Chandra Gurram , Varghese George , Darin Starkey , Guei-Yuan Lueh

IPC: G06T15/06 , G06F9/30 , G06F9/38 , G06F17/18

Abstract: Described herein is a graphics processing unit (GPU) comprising a single instruction, multiple thread (SIMT) multiprocessor comprising an instruction cache, a shared memory coupled with the instruction cache, and circuitry coupled with the shared memory and the instruction cache, the circuitry including multiple texture units, a first core including hardware to accelerate matrix operations, and a second core configured to receive an instruction having multiple operands in a bfloat16 (BF16) number format, wherein the multiple operands include a first source operand, a second source operand, and a third source operand, and the BF16 number format is a sixteen-bit floating point format having an eight-bit exponent and process the instruction, wherein to process the instruction includes to multiply the second source operand by the third source operand and add a first source operand to a result of the multiply.

28.

发明授权
Sparse matrix multiplication acceleration mechanism 有权

公开(公告)号：US11188618B2

公开(公告)日：2021-11-30

申请号：US16561715

申请日：2019-09-05

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Mathew Nevin , Jorge Parra , Ashutosh Garg , Shubra Marwaha , Shubh Shah

IPC: G06F17/16 , G06F7/487 , G06F9/30 , G06F13/16

Abstract: An apparatus to facilitate acceleration of matrix multiplication operations. The apparatus comprises a systolic array including matrix multiplication hardware to perform multiply-add operations on received matrix data comprising data from a plurality of input matrices and sparse matrix acceleration hardware to detect zero values in the matrix data and perform one or more optimizations on the matrix data to reduce multiply-add operations to be performed by the matrix multiplication hardware.

29.

发明授权
Using sparsity metadata to reduce systolic array power consumption 有权

公开(公告)号：US12190158B2

公开(公告)日：2025-01-07

申请号：US17358542

申请日：2021-06-25

Applicant: Intel Corporation

Inventor： Jorge Parra , Supratim Pal , Jiasheng Chen , Chandra Gurram

IPC: G06F7/50 , G06F1/329 , G06F7/523 , G06F7/544 , G06F9/38 , G06F9/50 , G06F15/80 , G06F17/16 , G06T1/20

Abstract: A processing apparatus can include a general-purpose parallel processing engine comprising a matrix accelerator including a multi-stage systolic array, where each stage includes multiple processing elements associated with multiple processing channels. The multiple processing elements are configured to receive output sparsity metadata that is independent of input sparsity of input matrix elements and perform processing operations on the input matrix elements based on the output sparsity metadata.

30.

发明授权
Systolic array of arbitrary physical and logical depth 有权

公开(公告)号：US12174783B2

公开(公告)日：2024-12-24

申请号：US17304678

申请日：2021-06-24

Applicant: Intel Corporation

Inventor： Jorge Parra , Wei-yu Chen , Kaiyu Chen , Varghese George , Junjie Gu , Chandra Gurram , Guei-Yuan Lueh , Stephen Junkins , Subramaniam Maiyuran , Supratim Pal

IPC: G06F15/80 , G06F9/50 , G06F9/54 , G06T1/20

Abstract: A processing apparatus includes a processing resource including a general-purpose parallel processing engine and a matrix accelerator. The matrix accelerator includes first circuitry to receive a command to perform operations associated with an instruction, second circuitry to configure the matrix accelerator according to a physical depth of a systolic array within the matrix accelerator and a logical depth associated with the instruction, third circuitry to read operands for the instruction from a register file associated with the systolic array, fourth circuitry to perform operations for the instruction via one or more passes through one or more physical pipeline stages of the systolic array based on a configuration performed by the second circuitry, and fifth circuitry to write output of the operations to the register file associated with the systolic array.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification