Patent search ap:("Intel Corporation") AND inv:"Ashutosh Garg" Page 5

41.

发明公开
SPARSE OPTIMIZATIONS FOR A MATRIX ACCELERATOR ARCHITECTURE 审中-公开

公开(公告)号：US20230351543A1

公开(公告)日：2023-11-02

申请号：US18310688

申请日：2023-05-02

Applicant: Intel Corporation

Inventor： Joydeep Ray , Scott Janus , Varghese George , Subramaniam Maiyuran , Altug Koker , Abhishek Appu , Prasoonkumar Surti , Vasanth Ranganathan , Valentin Andrei , Ashutosh Garg , Yoav Harel , Arthur Hunter, JR. , SungYe Kim , Mike Macpherson , Elmoustapha Ould-Ahmed-Vall , William Sadler , Lakshminarayanan Striramassarma , Vikranth Vemulapalli

IPC: G06N3/084 , G06F15/80 , G06F17/16 , G06N3/048 , G06T1/20 , G06F9/50 , G06F12/0806 , G06F7/544 , G06N3/08

CPC classification number: G06T1/20 , G06F7/5443 , G06F9/5027 , G06F12/0806 , G06F15/8046 , G06F17/16 , G06N3/048 , G06N3/08 , G06N3/084

Abstract: Embodiments described herein include, software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. Embodiment described herein provided techniques to detect zero value elements within a vector or a set of packed data elements output by a processing resource and generate metadata to indicate a location of the zero value elements within the plurality of data elements.

42.

发明公开
GRAPHICS PROCESSORS AND GRAPHICS PROCESSING UNITS HAVING DOT PRODUCT ACCUMULATE INSTRUCTION FOR HYBRID FLOATING POINT FORMAT 审中-公开

公开(公告)号：US20230195685A1

公开(公告)日：2023-06-22

申请号：US18170900

申请日：2023-02-17

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Shubra Marwaha , Ashutosh Garg , Supratim Pal , Jorge Parra , Chandra Gurram , Varghese George , Darin Starkey , Guei-Yuan Lueh

IPC: G06F15/78 , G06F9/30 , G06F12/128 , G06F17/16 , G06F12/0811 , G06F12/02 , G06F12/0866 , G06F7/544 , G06F9/50 , G06F17/18 , G06F9/38 , G06F12/0891 , G06F12/06 , G06F12/0888 , G06F12/0802 , G06T1/60 , G06F12/0871 , G06T1/20 , H03M7/46 , G06F12/0875 , G06F12/0862 , G06F15/80 , G06F12/0897 , G06F12/0893 , G06F12/0804 , G06F12/0882 , G06F7/575 , G06F12/1009 , G06F12/0895 , G06F7/58 , G06T15/06 , G06N3/08

CPC classification number: G06F15/7839 , G06F9/30043 , G06F12/128 , G06F17/16 , G06F12/0811 , G06F12/0238 , G06F12/0866 , G06F9/30014 , G06F7/5443 , G06F9/5077 , G06F12/0246 , G06F17/18 , G06F9/3887 , G06F12/0891 , G06F12/0607 , G06F12/0888 , G06F12/0802 , G06T1/60 , G06F9/30079 , G06F12/0871 , G06F9/30036 , G06T1/20 , H03M7/46 , G06F12/0215 , G06F12/0875 , G06F12/0862 , G06F15/8046 , G06F9/30047 , G06F9/30065 , G06F12/0897 , G06F9/5011 , G06F12/0893 , G06F12/0804 , G06F12/0882 , G06F9/3001 , G06F7/575 , G06F12/1009 , G06F9/3004 , G06F12/0895 , G06F7/588 , G06F2212/401 , G06F2212/1044 , G06F9/3867 , G06F9/3818 , G06F9/3802 , G06F2212/455 , G06F2212/1021 , G06F2212/60 , G06F2212/1008 , G06T15/06 , G06N3/08 , G06F2212/302

Abstract: Described herein is a graphics processing unit (GPU) configured to receive an instruction having multiple operands, where the instruction is a single instruction multiple data (SIMD) instruction configured to use a bfloat16 (BF16) number format and the BF16 number format is a sixteen-bit floating point format having an eight-bit exponent. The GPU can process the instruction using the multiple operands, where to process the instruction includes to perform a multiply operation, perform an addition to a result of the multiply operation, and apply a rectified linear unit function to a result of the addition.

43.

发明授权
Sparse optimizations for a matrix accelerator architecture 有权

公开(公告)号：US11676239B2

公开(公告)日：2023-06-13

申请号：US17303654

申请日：2021-06-03

Applicant: Intel Corporation

Inventor： Joydeep Ray , Scott Janus , Varghese George , Subramaniam Maiyuran , Altug Koker , Abhishek Appu , Prasoonkumar Surti , Vasanth Ranganathan , Andrei Valentin , Ashutosh Garg , Yoav Harel , Arthur Hunter, Jr. , SungYe Kim , Mike Macpherson , Elmoustapha Ould-Ahmed-Vall , William Sadler , Lakshminarayanan Striramassarma , Vikranth Vemulapalli

IPC: G06T1/20 , G06F9/50 , G06F12/0806 , G06F15/80 , G06F17/16 , G06F7/544 , G06N3/04 , G06N3/08 , G06N3/084 , G06N3/048

CPC classification number: G06T1/20 , G06F7/5443 , G06F9/5027 , G06F12/0806 , G06F15/8046 , G06F17/16 , G06N3/048 , G06N3/08 , G06N3/084

Abstract: Embodiments described herein include, software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. Embodiment described herein provided techniques to skip computational operations for zero filled matrices and sub-matrices. Embodiments additionally provide techniques to maintain data compression through to a processing unit. Embodiments additionally provide an architecture for a sparse aware logic unit.

44.

发明授权
Instruction and logic for systolic dot product with accumulate 有权

公开(公告)号：US11640297B2

公开(公告)日：2023-05-02

申请号：US17304153

申请日：2021-06-15

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Guei-Yuan Lueh , Supratim Pal , Ashutosh Garg , Chandra S. Gurram , Jorge E. Parra , Junjie Gu , Konrad Trifunovic , Hong Bin Liao , Mike B. MacPherson , Shubh B. Shah , Shubra Marwaha , Stephen Junkins , Timothy R. Bauer , Varghese George , Weiyu Chen

IPC: G06F9/30 , G06T1/20 , G06F9/38

Abstract: Embodiments described herein provided for an instruction and associated logic to enable GPGPU program code to access special purpose hardware logic to accelerate dot product operations. One embodiment provides for a graphics processing unit comprising a fetch unit to fetch an instruction for execution and a decode unit to decode the instruction into a decoded instruction. The decoded instruction is a matrix instruction to cause the graphics processing unit to perform a parallel dot product operation. The GPGPU also includes systolic dot product circuitry to execute the decoded instruction across one or more SIMD lanes using multiple systolic layers, wherein to execute the decoded instruction, a dot product computed at a first systolic layer is to be output to a second systolic layer, wherein each systolic layer includes one or more sets of interconnected multipliers and adders, each set of multipliers and adders to generate a dot product.

45.

发明授权
Instructions and logic for vector multiply add with zero skipping 有权

公开(公告)号：US11314515B2

公开(公告)日：2022-04-26

申请号：US16724831

申请日：2019-12-23

Applicant: Intel Corporation

Inventor： Supratim Pal , Sasikanth Avancha , Ishwar Bhati , Wei-Yu Chen , Dipankar Das , Ashutosh Garg , Chandra S. Gurram , Junjie Gu , Guei-Yuan Lueh , Subramaniam Maiyuran , Jorge E. Parra , Sudarshan Srinivasan , Varghese George

IPC: G06F9/38 , G06F9/30

Abstract: Embodiments described herein provide for an instruction and associated logic to enable a vector multiply add instructions with automatic zero skipping for sparse input. One embodiment provides for a general-purpose graphics processor comprising logic to perform operations comprising fetching a hardware macro instruction having a predicate mask, a repeat count, and a set of initial operands, where the initial operands include a destination operand and multiple source operands. The hardware macro instruction is configured to perform one or more multiply/add operations on input data associated with a set of matrices.

46.

发明授权
Instruction and logic for systolic dot product with accumulate 有权

公开(公告)号：US11042370B2

公开(公告)日：2021-06-22

申请号：US15957728

申请日：2018-04-19

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Guei-Yuan Lueh , Supratim Pal , Ashutosh Garg , Chandra S. Gurram , Jorge E. Parra , Junjie Gu , Konrad Trifunovic , Hong Bin Liao , Mike B. Macpherson , Shubh B. Shah , Shubra Marwaha , Stephen Junkins , Timothy R. Bauer , Varghese George , Weiyu Chen

IPC: G06F9/30 , G06T1/20 , G06F9/38

Abstract: Embodiments described herein provided for an instruction and associated logic to enable GPGPU program code to access special purpose hardware logic to accelerate dot product operations. One embodiment provides for a graphics processing unit comprising a fetch unit to fetch an instruction for execution and a decode unit to decode the instruction into a decoded instruction. The decoded instruction is a matrix instruction to cause the graphics processing unit to perform a parallel dot product operation. The GPGPU also includes a systolic dot product unit to execute the decoded instruction across one or more SIMD lanes using multiple systolic layers, wherein to execute the decoded instruction, a dot product computed at a first systolic layer is to be output to a second systolic layer, wherein each systolic layer includes one or more sets of interconnected multipliers and adders, each set of multipliers and adders to generate a dot product.

47.

发明申请
UTILIZING STRUCTURED SPARSITY IN SYSTOLIC ARRAYS 有权

公开(公告)号：US20210081201A1

公开(公告)日：2021-03-18

申请号：US17107823

申请日：2020-11-30

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Jorge Parra , Ashutosh Garg , Chandra Gurram , Chunhui Mei , Durgesh Borkar , Shubra Marwaha , Supratim Pal , Varghese George , Wei Xiong , Yan Li , Yongsheng Liu , Dipankar Das , Sasikanth Avancha , Dharma Teja Vooturi , Naveen K. Mellempudi

IPC: G06F9/30 , G06F9/38 , G06F15/80

Abstract: An apparatus to facilitate utilizing structured sparsity in systolic arrays is disclosed. The apparatus includes a processor comprising a systolic array to receive data from a plurality of source registers, the data comprising unpacked source data, structured source data that is packed based on sparsity, and metadata corresponding to the structured source data; identify portions of the unpacked source data to multiply with the structured source data, the portions of the unpacked source data identified based on the metadata; and output, to a destination register, a result of multiplication of the portions of the unpacked source data and the structured source data.

48.

发明授权
Handling instructions that require adding results of a plurality of multiplications 有权

公开(公告)号：US09904513B2

公开(公告)日：2018-02-27

申请号：US14749838

申请日：2015-06-25

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Jorge F. Garcia Pabon , Ashutosh Garg

IPC: G06F7/38 , G06F7/485 , G06F7/544

CPC classification number: G06F7/485 , G06F7/5443

Abstract: Floating point compound equations that involve addition of at least three terms, where each term involves a multiplication, can be implemented by using a bypass to prevent small, remaining values from being lost when shifted.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification