-
公开(公告)号:US12039001B2
公开(公告)日:2024-07-16
申请号:US18301386
申请日:2023-04-17
Applicant: Intel Corporation
Inventor: Subramaniam Maiyuran , Jorge Parra , Supratim Pal , Ashutosh Garg , Shubra Marwaha , Chandra Gurram , Darin Starkey , Durgesh Borkar , Varghese George
CPC classification number: G06F17/16 , G06F9/3001 , G06F9/30145 , G06F15/8046
Abstract: Described herein is a graphics processor including a plurality of processing clusters coupled with a host interface, each processing cluster comprising a plurality of multiprocessors, the plurality of multiprocessors interconnected via a data interconnect, and each multiprocessor comprising sparse matrix multiply acceleration hardware including a systolic processing array with feedback inputs.
-
公开(公告)号:US12008067B2
公开(公告)日:2024-06-11
申请号:US17527324
申请日:2021-11-16
Applicant: Intel Corporation
Inventor: Subramaniam Maiyuran , Mathew Nevin , Jorge Parra , Ashutosh Garg , Shubra Marwaha , Shubh Shah
CPC classification number: G06F17/16 , G06F7/4876 , G06F9/3001 , G06F9/30036 , G06F13/1673 , G06F2207/3892
Abstract: An apparatus to facilitate acceleration of matrix multiplication operations. The apparatus comprises a systolic array including matrix multiplication hardware to perform multiply-add operations on received matrix data comprising data from a plurality of input matrices and sparse matrix acceleration hardware to detect zero values in the matrix data and perform one or more optimizations on the matrix data to reduce multiply-add operations to be performed by the matrix multiplication hardware.
-
公开(公告)号:US11977885B2
公开(公告)日:2024-05-07
申请号:US17107823
申请日:2020-11-30
Applicant: Intel Corporation
Inventor: Subramaniam Maiyuran , Jorge Parra , Ashutosh Garg , Chandra Gurram , Chunhui Mei , Durgesh Borkar , Shubra Marwaha , Supratim Pal , Varghese George , Wei Xiong , Yan Li , Yongsheng Liu , Dipankar Das , Sasikanth Avancha , Dharma Teja Vooturi , Naveen K. Mellempudi
CPC classification number: G06F9/30036 , G06F9/3001 , G06F9/30101 , G06F9/3893 , G06F15/8046
Abstract: An apparatus to facilitate utilizing structured sparsity in systolic arrays is disclosed. The apparatus includes a processor comprising a systolic array to receive data from a plurality of source registers, the data comprising unpacked source data, structured source data that is packed based on sparsity, and metadata corresponding to the structured source data; identify portions of the unpacked source data to multiply with the structured source data, the portions of the unpacked source data identified based on the metadata; and output, to a destination register, a result of multiplication of the portions of the unpacked source data and the structured source data.
-
公开(公告)号:US20240111826A1
公开(公告)日:2024-04-04
申请号:US17937252
申请日:2022-09-30
Applicant: Intel Corporation
Inventor: Jiasheng Chen , Kevin Hurd , Changwon Rhee , Jorge Parra , Fangwen Fu , Theo Drane , William Zorn , Peter Caday , Gregory Henry , Guei-Yuan Lueh , Farzad Chehrazi , Amit Karande , Turbo Majumder , Xinmin Tian , Milind Girkar , Hong Jiang
CPC classification number: G06F17/16 , G06F7/5443 , G06T1/20
Abstract: An apparatus to facilitate hardware enhancements for double precision systolic support is disclosed. The apparatus includes matrix acceleration hardware having double-precision (DP) matrix multiplication circuitry including a multiplier circuits to multiply pairs of input source operands in a DP floating-point format; adders to receive multiplier outputs from the multiplier circuits and accumulate the multiplier outputs in a high precision intermediate format; an accumulator circuit to accumulate adder outputs from the adders with at least one of a third global source operand on a first pass of the DP matrix multiplication circuitry or an intermediate result from the first pass on a second pass of the DP matrix multiplication circuitry, wherein the accumulator circuit to generate an accumulator output in the high precision intermediate format; and a down conversion and rounding circuit to down convert and round an output of the second pass as final result in the DP floating-point format.
-
15.
公开(公告)号:US20230367740A1
公开(公告)日:2023-11-16
申请号:US18310129
申请日:2023-05-01
Applicant: Intel Corporation
Inventor: Subramaniam Maiyuran , Jorge Parra , Supratim Pal , Chandra Gurram
IPC: G06F15/80
CPC classification number: G06F15/8046 , G06F15/8007 , G06N20/00
Abstract: An apparatus to facilitate computing efficient cross channel operations in parallel computing machines using systolic arrays is disclosed. The apparatus includes a plurality of registers and one or more processing elements communicably coupled to the plurality of registers. The one or more processing elements include a systolic array circuit to perform cross-channel operations on source data received from a single source register of the plurality of registers, wherein the systolic array circuit is modified to: receive inputs from the single source register at different stages of the systolic array circuit; perform cross-channel operations at channels of the systolic array circuit; bypass disabled channels of the systolic array circuit, the disabled channels not used to compute the cross-channel operations; and broadcast a final result of a final stage of the systolic array circuit to all channels of a destination register.
-
公开(公告)号:US11636174B2
公开(公告)日:2023-04-25
申请号:US17527882
申请日:2021-11-16
Applicant: Intel Corporation
Inventor: Subramaniam Maiyuran , Jorge Parra , Supratim Pal , Ashutosh Garg , Shubra Marwaha , Chandra Gurram , Darin Starkey , Durgesh Borkar , Varghese George
Abstract: Described herein is an accelerator device including a host interface, a fabric interconnect coupled with the host interface, and one or more hardware tiles coupled with the fabric interconnect, the one or more hardware tiles including sparse matrix multiply acceleration hardware including a systolic array with feedback inputs.
-
公开(公告)号:US20220414054A1
公开(公告)日:2022-12-29
申请号:US17304797
申请日:2021-06-25
Applicant: Intel Corporation
Inventor: Jorge Parra , Jiasheng Chen , Supratim Pal , Fangwen Fu , Sabareesh Ganapathy , Chandra Gurram , Chunhui Mei , Yue Qi
Abstract: A processing apparatus described herein includes a general-purpose parallel processing engine comprising a systolic array having multiple pipelines, each of the multiple pipelines including multiple pipeline stages, wherein the multiple pipelines include a first pipeline, a second pipeline, and a common input shared between the first pipeline and the second pipeline.
-
公开(公告)号:US20220414053A1
公开(公告)日:2022-12-29
申请号:US17304678
申请日:2021-06-24
Applicant: Intel Corporation
Inventor: Jorge Parra , Wei-yu Chen , Kaiyu Chen , Varghese George , Junjie Gu , Chandra Gurram , Guei-Yuan Lueh , Stephen Junkins , Subramaniam Maiyuran , Supratim Pal
Abstract: A processing apparatus includes a processing resource including a general-purpose parallel processing engine and a matrix accelerator. The matrix accelerator includes first circuitry to receive a command to perform operations associated with an instruction, second circuitry to configure the matrix accelerator according to a physical depth of a systolic array within the matrix accelerator and a logical depth associated with the instruction, third circuitry to read operands for the instruction from a register file associated with the systolic array, fourth circuitry to perform operations for the instruction via one or more passes through one or more physical pipeline stages of the systolic array based on a configuration performed by the second circuitry, and fifth circuitry to write output of the operations to the register file associated with the systolic array.
-
19.
公开(公告)号:US20210365402A1
公开(公告)日:2021-11-25
申请号:US16900236
申请日:2020-06-12
Applicant: Intel Corporation
Inventor: Subramaniam Maiyuran , Jorge Parra , Supratim Pal , Chandra Gurram
IPC: G06F15/80
Abstract: An apparatus to facilitate computing efficient cross channel operations in parallel computing machines using systolic arrays is disclosed. The apparatus includes a plurality of registers and one or more processing elements communicably coupled to the plurality of registers. The one or more processing elements include a systolic array circuit to perform cross-channel operations on source data received from a single source register of the plurality of registers, the systolic array circuit modified to receive inputs from the single source register and route elements of the single source register to multiple channels in the systolic array circuit.
-
公开(公告)号:US20250117360A1
公开(公告)日:2025-04-10
申请号:US18931412
申请日:2024-10-30
Applicant: Intel Corporation
Inventor: Jorge Parra , Wei-yu Chen , Kaiyu Chen , Varghese George , Junjie Gu , Chandra Gurram , Guei-Yuan Lueh , Stephen Junkins , Subramaniam Maiyuran , Supratim Pal
Abstract: A processing apparatus includes a processing resource including a general-purpose parallel processing engine and a matrix accelerator. The matrix accelerator includes first circuitry to receive a command to perform operations associated with an instruction, second circuitry to configure the matrix accelerator according to a physical depth of a systolic array within the matrix accelerator and a logical depth associated with the instruction, third circuitry to read operands for the instruction from a register file associated with the systolic array, fourth circuitry to perform operations for the instruction via one or more passes through one or more physical pipeline stages of the systolic array based on a configuration performed by the second circuitry, and fifth circuitry to write output of the operations to the register file associated with the systolic array.
-
-
-
-
-
-
-
-
-