-
公开(公告)号:US11669329B2
公开(公告)日:2023-06-06
申请号:US17723312
申请日:2022-04-18
Applicant: Intel Corporation
Inventor: Supratim Pal , Sasikanth Avancha , Ishwar Bhati , Wei-Yu Chen , Dipankar Das , Ashutosh Garg , Chandra S. Gurram , Junjie Gu , Guei-Yuan Lueh , Subramaniam Maiyuran , Jorge E. Parra , Sudarshan Srinivasan , Varghese George
CPC classification number: G06F9/3802 , G06F9/3001 , G06F9/30018 , G06F9/30145
Abstract: Embodiments described herein provide for an instruction and associated logic to enable a vector multiply add instructions with automatic zero skipping for sparse input. One embodiment provides for a general-purpose graphics processor comprising logic to perform operations comprising fetching a hardware macro instruction having a predicate mask, a repeat count, and a set of initial operands, where the initial operands include a destination operand and multiple source operands. The hardware macro instruction is configured to perform one or more multiply/add operations on input data associated with a set of matrices.
-
公开(公告)号:US12174783B2
公开(公告)日:2024-12-24
申请号:US17304678
申请日:2021-06-24
Applicant: Intel Corporation
Inventor: Jorge Parra , Wei-yu Chen , Kaiyu Chen , Varghese George , Junjie Gu , Chandra Gurram , Guei-Yuan Lueh , Stephen Junkins , Subramaniam Maiyuran , Supratim Pal
Abstract: A processing apparatus includes a processing resource including a general-purpose parallel processing engine and a matrix accelerator. The matrix accelerator includes first circuitry to receive a command to perform operations associated with an instruction, second circuitry to configure the matrix accelerator according to a physical depth of a systolic array within the matrix accelerator and a logical depth associated with the instruction, third circuitry to read operands for the instruction from a register file associated with the systolic array, fourth circuitry to perform operations for the instruction via one or more passes through one or more physical pipeline stages of the systolic array based on a configuration performed by the second circuitry, and fifth circuitry to write output of the operations to the register file associated with the systolic array.
-
公开(公告)号:US20220414053A1
公开(公告)日:2022-12-29
申请号:US17304678
申请日:2021-06-24
Applicant: Intel Corporation
Inventor: Jorge Parra , Wei-yu Chen , Kaiyu Chen , Varghese George , Junjie Gu , Chandra Gurram , Guei-Yuan Lueh , Stephen Junkins , Subramaniam Maiyuran , Supratim Pal
Abstract: A processing apparatus includes a processing resource including a general-purpose parallel processing engine and a matrix accelerator. The matrix accelerator includes first circuitry to receive a command to perform operations associated with an instruction, second circuitry to configure the matrix accelerator according to a physical depth of a systolic array within the matrix accelerator and a logical depth associated with the instruction, third circuitry to read operands for the instruction from a register file associated with the systolic array, fourth circuitry to perform operations for the instruction via one or more passes through one or more physical pipeline stages of the systolic array based on a configuration performed by the second circuitry, and fifth circuitry to write output of the operations to the register file associated with the systolic array.
-
公开(公告)号:US20220326953A1
公开(公告)日:2022-10-13
申请号:US17723312
申请日:2022-04-18
Applicant: Intel Corporation
Inventor: Supratim Pal , Sasikanth Avancha , Ishwar Bhati , Wei-Yu Chen , Dipankar Das , Ashutosh Garg , Chandra S. Gurram , Junjie Gu , Guei-Yuan Lueh , Subramaniam Maiyuran , Jorge E. Parra , Sudarshan Srinivasan , Varghese George
Abstract: Embodiments described herein provide for an instruction and associated logic to enable a vector multiply add instructions with automatic zero skipping for sparse input. One embodiment provides for a general-purpose graphics processor comprising logic to perform operations comprising fetching a hardware macro instruction having a predicate mask, a repeat count, and a set of initial operands, where the initial operands include a destination operand and multiple source operands. The hardware macro instruction is configured to perform one or more multiply/add operations on input data associated with a set of matrices.
-
公开(公告)号:US20210191724A1
公开(公告)日:2021-06-24
申请号:US16724831
申请日:2019-12-23
Applicant: Intel Corporation
Inventor: Supratim Pal , Sasikanth Avancha , Ishwar Bhati , Wei-Yu Chen , Dipankar Das , Ashutosh Garg , Chandra S. Gurram , Junjie Gu , Guei-Yuan Lueh , Subramaniam Maiyuran , Jorge E. Parra , Sudarshan Srinivasan , Varghese George
Abstract: Embodiments described herein provide for an instruction and associated logic to enable a vector multiply add instructions with automatic zero skipping for sparse input. One embodiment provides for a general-purpose graphics processor comprising logic to perform operations comprising fetching a hardware macro instruction having a predicate mask, a repeat count, and a set of initial operands, where the initial operands include a destination operand and multiple source operands. The hardware macro instruction is configured to perform one or more multiply/add operations on input data associated with a set of matrices.
-
公开(公告)号:US20250117360A1
公开(公告)日:2025-04-10
申请号:US18931412
申请日:2024-10-30
Applicant: Intel Corporation
Inventor: Jorge Parra , Wei-yu Chen , Kaiyu Chen , Varghese George , Junjie Gu , Chandra Gurram , Guei-Yuan Lueh , Stephen Junkins , Subramaniam Maiyuran , Supratim Pal
Abstract: A processing apparatus includes a processing resource including a general-purpose parallel processing engine and a matrix accelerator. The matrix accelerator includes first circuitry to receive a command to perform operations associated with an instruction, second circuitry to configure the matrix accelerator according to a physical depth of a systolic array within the matrix accelerator and a logical depth associated with the instruction, third circuitry to read operands for the instruction from a register file associated with the systolic array, fourth circuitry to perform operations for the instruction via one or more passes through one or more physical pipeline stages of the systolic array based on a configuration performed by the second circuitry, and fifth circuitry to write output of the operations to the register file associated with the systolic array.
-
公开(公告)号:US11640297B2
公开(公告)日:2023-05-02
申请号:US17304153
申请日:2021-06-15
Applicant: Intel Corporation
Inventor: Subramaniam Maiyuran , Guei-Yuan Lueh , Supratim Pal , Ashutosh Garg , Chandra S. Gurram , Jorge E. Parra , Junjie Gu , Konrad Trifunovic , Hong Bin Liao , Mike B. MacPherson , Shubh B. Shah , Shubra Marwaha , Stephen Junkins , Timothy R. Bauer , Varghese George , Weiyu Chen
Abstract: Embodiments described herein provided for an instruction and associated logic to enable GPGPU program code to access special purpose hardware logic to accelerate dot product operations. One embodiment provides for a graphics processing unit comprising a fetch unit to fetch an instruction for execution and a decode unit to decode the instruction into a decoded instruction. The decoded instruction is a matrix instruction to cause the graphics processing unit to perform a parallel dot product operation. The GPGPU also includes systolic dot product circuitry to execute the decoded instruction across one or more SIMD lanes using multiple systolic layers, wherein to execute the decoded instruction, a dot product computed at a first systolic layer is to be output to a second systolic layer, wherein each systolic layer includes one or more sets of interconnected multipliers and adders, each set of multipliers and adders to generate a dot product.
-
公开(公告)号:US11314515B2
公开(公告)日:2022-04-26
申请号:US16724831
申请日:2019-12-23
Applicant: Intel Corporation
Inventor: Supratim Pal , Sasikanth Avancha , Ishwar Bhati , Wei-Yu Chen , Dipankar Das , Ashutosh Garg , Chandra S. Gurram , Junjie Gu , Guei-Yuan Lueh , Subramaniam Maiyuran , Jorge E. Parra , Sudarshan Srinivasan , Varghese George
Abstract: Embodiments described herein provide for an instruction and associated logic to enable a vector multiply add instructions with automatic zero skipping for sparse input. One embodiment provides for a general-purpose graphics processor comprising logic to perform operations comprising fetching a hardware macro instruction having a predicate mask, a repeat count, and a set of initial operands, where the initial operands include a destination operand and multiple source operands. The hardware macro instruction is configured to perform one or more multiply/add operations on input data associated with a set of matrices.
-
公开(公告)号:US11042370B2
公开(公告)日:2021-06-22
申请号:US15957728
申请日:2018-04-19
Applicant: Intel Corporation
Inventor: Subramaniam Maiyuran , Guei-Yuan Lueh , Supratim Pal , Ashutosh Garg , Chandra S. Gurram , Jorge E. Parra , Junjie Gu , Konrad Trifunovic , Hong Bin Liao , Mike B. Macpherson , Shubh B. Shah , Shubra Marwaha , Stephen Junkins , Timothy R. Bauer , Varghese George , Weiyu Chen
Abstract: Embodiments described herein provided for an instruction and associated logic to enable GPGPU program code to access special purpose hardware logic to accelerate dot product operations. One embodiment provides for a graphics processing unit comprising a fetch unit to fetch an instruction for execution and a decode unit to decode the instruction into a decoded instruction. The decoded instruction is a matrix instruction to cause the graphics processing unit to perform a parallel dot product operation. The GPGPU also includes a systolic dot product unit to execute the decoded instruction across one or more SIMD lanes using multiple systolic layers, wherein to execute the decoded instruction, a dot product computed at a first systolic layer is to be output to a second systolic layer, wherein each systolic layer includes one or more sets of interconnected multipliers and adders, each set of multipliers and adders to generate a dot product.
-
-
-
-
-
-
-
-