Patent search ap:("INTEL CORPORATION") AND inv:"Ashutosh Garg" Page 4

31.

发明授权
Scalable sparse matrix multiply acceleration using systolic arrays with feedback inputs 有权

公开(公告)号：US11636174B2

公开(公告)日：2023-04-25

申请号：US17527882

申请日：2021-11-16

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Jorge Parra , Supratim Pal , Ashutosh Garg , Shubra Marwaha , Chandra Gurram , Darin Starkey , Durgesh Borkar , Varghese George

IPC: G06F17/16 , G06F9/30 , G06F15/80

Abstract: Described herein is an accelerator device including a host interface, a fabric interconnect coupled with the host interface, and one or more hardware tiles coupled with the fabric interconnect, the one or more hardware tiles including sparse matrix multiply acceleration hardware including a systolic array with feedback inputs.

32.

发明申请
INSTRUCTIONS AND LOGIC FOR VECTOR MULTIPLY ADD WITH ZERO SKIPPING 有权

公开(公告)号：US20220326953A1

公开(公告)日：2022-10-13

申请号：US17723312

申请日：2022-04-18

Applicant: Intel Corporation

Inventor： Supratim Pal , Sasikanth Avancha , Ishwar Bhati , Wei-Yu Chen , Dipankar Das , Ashutosh Garg , Chandra S. Gurram , Junjie Gu , Guei-Yuan Lueh , Subramaniam Maiyuran , Jorge E. Parra , Sudarshan Srinivasan , Varghese George

IPC: G06F9/38 , G06F9/30

Abstract: Embodiments described herein provide for an instruction and associated logic to enable a vector multiply add instructions with automatic zero skipping for sparse input. One embodiment provides for a general-purpose graphics processor comprising logic to perform operations comprising fetching a hardware macro instruction having a predicate mask, a repeat count, and a set of initial operands, where the initial operands include a destination operand and multiple source operands. The hardware macro instruction is configured to perform one or more multiply/add operations on input data associated with a set of matrices.

33.

发明授权
Compiler assisted register file write reduction 有权

公开(公告)号：US11321799B2

公开(公告)日：2022-05-03

申请号：US16726659

申请日：2019-12-24

Applicant: Intel Corporation

Inventor： Chandra S. Gurram , Gang Y. Chen , Subramaniam Maiyuran , Supratim Pal , Ashutosh Garg , Jorge E. Parra , Darin M. Starkey , Guei-Yuan Lueh , Wei-Yu Chen

IPC: G06T1/20 , G06T1/60

Abstract: Examples described herein relate to a software and hardware optimization that manages scenarios where a write operation to a register is less than an entirety of the register. A compiler detects instructions that make partial writes to the same register, groups such instructions, and provides hints to hardware of the partial write. The execution unit combines the output data for grouped instructions and updates the destination register as single write instead of multiple separate partial writes.

34.

发明申请
INSTRUCTIONS AND LOGIC FOR VECTOR MULTIPLY ADD WITH ZERO SKIPPING 有权

公开(公告)号：US20210191724A1

公开(公告)日：2021-06-24

申请号：US16724831

申请日：2019-12-23

Applicant: Intel Corporation

Inventor： Supratim Pal , Sasikanth Avancha , Ishwar Bhati , Wei-Yu Chen , Dipankar Das , Ashutosh Garg , Chandra S. Gurram , Junjie Gu , Guei-Yuan Lueh , Subramaniam Maiyuran , Jorge E. Parra , Sudarshan Srinivasan , Varghese George

IPC: G06F9/38 , G06F9/30

Abstract: Embodiments described herein provide for an instruction and associated logic to enable a vector multiply add instructions with automatic zero skipping for sparse input. One embodiment provides for a general-purpose graphics processor comprising logic to perform operations comprising fetching a hardware macro instruction having a predicate mask, a repeat count, and a set of initial operands, where the initial operands include a destination operand and multiple source operands. The hardware macro instruction is configured to perform one or more multiply/add operations on input data associated with a set of matrices.

35.

发明申请
DOT PRODUCT MULTIPLIER MECHANISM 有权

公开(公告)号：US20210141857A1

公开(公告)日：2021-05-13

申请号：US16682225

申请日：2019-11-13

Applicant: Intel Corporation

Inventor： Nevin Mathew , Shubra Marwaha , Ashutosh Garg

IPC: G06F17/16 , G06N20/10 , G06N3/08 , G06F9/30

Abstract: An apparatus to facilitate matrix multiplication operations. The apparatus comprises multiplication hardware to operate in a dot product mode, wherein a multiplication stage included in the multiplication hardware is configured as a dot product of a number of bit vectors (N) to perform N×N multiplication operations on a plurality of multiplicands and perform addition operations on results of the N×N multiplication operations.

36.

发明申请
SPARSE OPTIMIZATOINS FOR A MATRIX ACCELERATOR ARCHITECTURE 有权

公开(公告)号：US20210035258A1

公开(公告)日：2021-02-04

申请号：US17064427

申请日：2020-10-06

Applicant: Intel Corporation

Inventor： Joydeep Ray , Scott Janus , Varghese George , Subramaniam Maiyuran , Altug Koker , Abhishek Appu , Prasoonkumar Surti , Vasanth Ranganathan , Andrei Valentin , Ashutosh Garg , Yoav Harel , Arthur Hunter, JR. , SungYe Kim , Mike Macpherson , Elmoustapha Ould-Ahmed-Vall , William Sadler , Lakshminarayanan Striramassarma , Vikranth Vemulapalli

IPC: G06T1/20 , G06F9/50 , G06F15/80 , G06F12/0806 , G06N3/04 , G06N3/08 , G06F17/16

Abstract: Embodiments described herein include, software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. Embodiment described herein provided techniques to skip computational operations for zero filled matrices and sub-matrices. Embodiments additionally provide techniques to maintain data compression through to a processing unit. Embodiments additionally provide an architecture for a sparse aware logic unit.

37.

发明授权
Register bank conflict reduction for multi-threaded processor 有权

公开(公告)号：US10754651B2

公开(公告)日：2020-08-25

申请号：US16023713

申请日：2018-06-29

Applicant: Intel Corporation

Inventor： Chandra Gurram , Subramaniam Maiyuran , Buqi Cheng , Ashutosh Garg , Guei-Yuan Lueh , Wei-Yu Chen

IPC: G06F9/30 , G06T1/20 , G06T15/00 , G06F9/38

Abstract: Embodiments are generally directed to register bank conflict reduction for multi-threaded processor execution units. An embodiment of an apparatus includes a processor including one or more execution units (EUs), at least a first execution unit (EU) to process a plurality of threads, the first EU including a register file including multiple register banks with each register bank including multiple registers, and one or more read multiplexers to read registers from the register file, wherein attempting to read more than one register from a single register bank of the register file in a same clock cycle generates a register bank conflict. Registers for each thread for the first EU are distributed across the registers banks within the register file such that a first register for a first thread of the plurality of threads and a following second register for the first thread are located in different register banks within the register file.

38.

发明授权
Software scoreboard information and synchronization 有权

公开(公告)号：US10692170B2

公开(公告)日：2020-06-23

申请号：US16437961

申请日：2019-06-11

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Supratim Pal , Jorge E. Parra , Chandra S. Gurram , Ashwin J. Shivani , Ashutosh Garg , Brent A. Schwartz , Jorge F. Garcia Pabon , Darin M. Starkey , Shubh B. Shah , Guei-Yuan Lueh , Kaiyu Chen , Konrad Trifunovic , Buqi Cheng , Weiyu Chen

IPC: G06F9/38 , G06F8/41 , G06T1/20 , G06F9/30 , G06T1/60 , G09G5/36 , G06T15/00

Abstract: Embodiments described herein provide a graphics processor in which dependency tracking hardware is simplified via the use of compiler provided software scoreboard information. In one embodiment the shader compiler for shader programs is configured to encode software scoreboard information into each instruction. Dependencies can be evaluated by the shader compiler and provided as scoreboard information with each instruction. The hardware can then use the provided information when scheduling instructions. In one embodiment, a software scoreboard synchronization instruction is provided to facilitate software dependency handling within a shader program. Using software to facilitate software dependency handling and synchronization can simplify hardware design, reducing the area consumed by the hardware. In one embodiment, dependencies can be evaluated by the shader compiler instead of the GPU hardware. The compiler can then insert a software scoreboard sync immediate instruction into compiled program code to manage instruction dependencies and prevent data hazards from occurring.

39.

发明申请
REGISTER BANK CONFLICT REDUCTION FOR MULTI-THREADED PROCESSOR 审中-公开

公开(公告)号：US20200004534A1

公开(公告)日：2020-01-02

申请号：US16023713

申请日：2018-06-29

Applicant: Intel Corporation

Inventor： Chandra Gurram , Subramaniam Maiyuran , Buqi Cheng , Ashutosh Garg , Guei-Yuan Lueh , Wei-Yu Chen

IPC: G06F9/30 , G06F9/38 , G06T15/00 , G06T1/20

Abstract: Embodiments are generally directed to register bank conflict reduction for multi-threaded processor execution units. An embodiment of an apparatus includes a processor including one or more execution units (EUs), at least a first execution unit (EU) to process a plurality of threads, the first EU including a register file including multiple register banks with each register bank including multiple registers, and one or more read multiplexers to read registers from the register file, wherein attempting to read more than one register from a single register bank of the register file in a same clock cycle generates a register bank conflict. Registers for each thread for the first EU are distributed across the registers banks within the register file such that a first register for a first thread of the plurality of threads and a following second register for the first thread are located in different register banks within the register file.

40.

发明申请
SOFTWARE SCOREBOARD INFORMATION AND SYNCHRONIZATION 审中-公开

公开(公告)号：US20190362460A1

公开(公告)日：2019-11-28

申请号：US16437961

申请日：2019-06-11

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Supratim Pal , Jorge E. Parra , Chandra S. Gurram , Ashwin J. Shivani , Ashutosh Garg , Brent A. Schwartz , Jorge F. Garcia Pabon , Darin M. Starkey , Shubh B. Shah , Guei-Yuan Lueh , Kaiyu Chen , Konrad Trifunovic , Buqi Cheng , Weiyu Chen

IPC: G06T1/20 , G06F9/30 , G06F9/38 , G06F8/41

Abstract: Embodiments described herein provide a graphics processor in which dependency tracking hardware is simplified via the use of compiler provided software scoreboard information. In one embodiment the shader compiler for shader programs is configured to encode software scoreboard information into each instruction. Dependencies can be evaluated by the shader compiler and provided as scoreboard information with each instruction. The hardware can then use the provided information when scheduling instructions. In one embodiment, a software scoreboard synchronization instruction is provided to facilitate software dependency handling within a shader program. Using software to facilitate software dependency handling and synchronization can simplify hardware design, reducing the area consumed by the hardware. In one embodiment, dependencies can be evaluated by the shader compiler instead of the GPU hardware. The compiler can then insert a software scoreboard sync immediate instruction into compiled program code to manage instruction dependencies and prevent data hazards from occurring.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification