Patent search ap:("INTEL CORPORATION") AND inv:"Supratim Pal" Page 2

11.

发明授权
Register sharing mechanism 有权

公开(公告)号：US10983794B2

公开(公告)日：2021-04-20

申请号：US16443285

申请日：2019-06-17

Applicant: Intel Corporation

Inventor： Guei-Yuan Lueh , Subramaniam Maiyuran , Weiyu Chen , Konrad Trifunovic , Supratim Pal , Chandra S. Gurram , Jorge E. Parra , Pratik J. Ashar , Tomasz Bujewski

IPC: G06F9/30 , G06F9/54 , G06F9/48 , G06F12/1009 , G06F9/50

Abstract: An processor to facilitate register sharing is disclosed. The processor includes a plurality of execution units (EUs), each including a General Purpose Register File (GRF) having a plurality of registers; and register sharing hardware to divide the plurality of registers into a first set of registers dedicated for execution of a first set of threads and a second set of registers shared for execution of a second set of threads.

12.

发明申请
UTILIZING STRUCTURED SPARSITY IN SYSTOLIC ARRAYS 有权

公开(公告)号：US20210081201A1

公开(公告)日：2021-03-18

申请号：US17107823

申请日：2020-11-30

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Jorge Parra , Ashutosh Garg , Chandra Gurram , Chunhui Mei , Durgesh Borkar , Shubra Marwaha , Supratim Pal , Varghese George , Wei Xiong , Yan Li , Yongsheng Liu , Dipankar Das , Sasikanth Avancha , Dharma Teja Vooturi , Naveen K. Mellempudi

IPC: G06F9/30 , G06F9/38 , G06F15/80

Abstract: An apparatus to facilitate utilizing structured sparsity in systolic arrays is disclosed. The apparatus includes a processor comprising a systolic array to receive data from a plurality of source registers, the data comprising unpacked source data, structured source data that is packed based on sparsity, and metadata corresponding to the structured source data; identify portions of the unpacked source data to multiply with the structured source data, the portions of the unpacked source data identified based on the metadata; and output, to a destination register, a result of multiplication of the portions of the unpacked source data and the structured source data.

13.

发明授权
Large integer multiplication enhancements for graphics environment 有权

公开(公告)号：US12236238B2

公开(公告)日：2025-02-25

申请号：US17358867

申请日：2021-06-25

Applicant: Intel Corporation

Inventor： Supratim Pal , Li-An Tang , Changwon Rhee , Timothy R. Bauer , Alexander Lyashevsky , Jiasheng Chen

IPC: G06F9/30 , G06F7/44 , G06F7/506 , G06F7/57 , G06F9/38

Abstract: An apparatus to facilitate large integer multiplication enhancements in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising multiplier circuitry to: receive operands for a multiplication operation, wherein the multiplication operation is part of a chain of multiplication operations for a large integer multiplication; and issue a multiply and add (MAD) instruction for the multiplication operation utilizing at least one of a double precision multiplier or a 48 bit output, wherein the MAD instruction to generate an output in a single clock cycle of the processor.

14.

发明授权
Using sparsity metadata to reduce systolic array power consumption 有权

公开(公告)号：US12190158B2

公开(公告)日：2025-01-07

申请号：US17358542

申请日：2021-06-25

Applicant: Intel Corporation

Inventor： Jorge Parra , Supratim Pal , Jiasheng Chen , Chandra Gurram

IPC: G06F7/50 , G06F1/329 , G06F7/523 , G06F7/544 , G06F9/38 , G06F9/50 , G06F15/80 , G06F17/16 , G06T1/20

Abstract: A processing apparatus can include a general-purpose parallel processing engine comprising a matrix accelerator including a multi-stage systolic array, where each stage includes multiple processing elements associated with multiple processing channels. The multiple processing elements are configured to receive output sparsity metadata that is independent of input sparsity of input matrix elements and perform processing operations on the input matrix elements based on the output sparsity metadata.

15.

发明授权
Systolic array of arbitrary physical and logical depth 有权

公开(公告)号：US12174783B2

公开(公告)日：2024-12-24

申请号：US17304678

申请日：2021-06-24

Applicant: Intel Corporation

Inventor： Jorge Parra , Wei-yu Chen , Kaiyu Chen , Varghese George , Junjie Gu , Chandra Gurram , Guei-Yuan Lueh , Stephen Junkins , Subramaniam Maiyuran , Supratim Pal

IPC: G06F15/80 , G06F9/50 , G06F9/54 , G06T1/20

Abstract: A processing apparatus includes a processing resource including a general-purpose parallel processing engine and a matrix accelerator. The matrix accelerator includes first circuitry to receive a command to perform operations associated with an instruction, second circuitry to configure the matrix accelerator according to a physical depth of a systolic array within the matrix accelerator and a logical depth associated with the instruction, third circuitry to read operands for the instruction from a register file associated with the systolic array, fourth circuitry to perform operations for the instruction via one or more passes through one or more physical pipeline stages of the systolic array based on a configuration performed by the second circuitry, and fifth circuitry to write output of the operations to the register file associated with the systolic array.

16.

发明公开
GRAPHICS PROCESSORS AND GRAPHICS PROCESSING UNITS HAVING DOT PRODUCT ACCUMULATE INSTRUCTION FOR HYBRID FLOATING POINT FORMAT 审中-公开

公开(公告)号：US20240362180A1

公开(公告)日：2024-10-31

申请号：US18647549

申请日：2024-04-26

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Shubra Marwaha , Ashutosh Garg , Supratim Pal , Jorge Parra , Chandra Gurram , Varghese George , Darin Starkey , Guei-Yuan Lueh

IPC: G06F15/78 , G06F7/544 , G06F7/575 , G06F7/58 , G06F9/30 , G06F9/38 , G06F9/50 , G06F12/02 , G06F12/06 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/80 , G06F17/16 , G06F17/18 , G06N3/08 , G06T1/20 , G06T1/60 , G06T15/06 , H03M7/46

CPC classification number: G06F15/7839 , G06F7/5443 , G06F7/575 , G06F7/588 , G06F9/3001 , G06F9/30014 , G06F9/30036 , G06F9/3004 , G06F9/30043 , G06F9/30047 , G06F9/30065 , G06F9/30079 , G06F9/3887 , G06F9/5011 , G06F9/5077 , G06F12/0215 , G06F12/0238 , G06F12/0246 , G06F12/0607 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/8046 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06F9/3802 , G06F9/3818 , G06F9/3867 , G06F2212/1008 , G06F2212/1021 , G06F2212/1044 , G06F2212/302 , G06F2212/401 , G06F2212/455 , G06F2212/60 , G06N3/08 , G06T15/06

Abstract: Graphics processors and graphics processing units having dot product accumulate instructions for a hybrid floating point format are disclosed. In one embodiment, a graphics multiprocessor comprises an instruction unit to dispatch instructions and a processing resource coupled to the instruction unit. The processing resource is configured to receive a dot product accumulate instruction from the instruction unit and to process the dot product accumulate instruction using a bfloat16 number (BF16) format.

17.

发明公开
SUPPORTING 8-BIT FLOATING POINT FORMAT OPERANDS IN A COMPUTING ARCHITECTURE 审中-公开

公开(公告)号：US20240256274A1

公开(公告)日：2024-08-01

申请号：US18618648

申请日：2024-03-27

Applicant: Intel Corporation

Inventor： Naveen Mellempudi , Subramaniam Maiyuran , Varghese George , Fangwen Fu , Shuai Mu , Supratim Pal , Wei Xiong

IPC: G06F9/30 , G06F9/38 , G06F9/48 , G06F17/16 , G06N20/00

CPC classification number: G06F9/30014 , G06F9/3818 , G06F9/4843 , G06F17/16 , G06N20/00

Abstract: An apparatus to facilitate supporting 8-bit floating point format operands in a computing architecture is disclosed. The apparatus includes a processor comprising: a decoder to decode an instruction fetched for execution into a decoded instruction, wherein the decoded instruction is a matrix instruction that operates on 8-bit floating point operands to cause the processor to perform a parallel dot product operation; a controller to schedule the decoded instruction and provide input data for the 8-bit floating point operands in accordance with an 8-bit floating data format indicated by the decoded instruction; and systolic dot product circuitry to execute the decoded instruction using systolic layers, each systolic layer comprises one or more sets of interconnected multipliers, shifters, and adder, each set of multipliers, shifters, and adders to generate a dot product of the 8-bit floating point operands.

18.

发明公开
SHARED LOCAL REGISTERS FOR THREAD TEAM PROCESSING 审中-公开

公开(公告)号：US20240112295A1

公开(公告)日：2024-04-04

申请号：US17958216

申请日：2022-09-30

Applicant: Intel Corporation

Inventor： Biju George , Fangwen Fu , Supratim Pal , Jorge Parra , Chunhui Mei , Maxim Kazakov , Joydeep Ray

IPC: G06T1/20 , G06F9/30 , G06F9/38

CPC classification number: G06T1/20 , G06F9/30098 , G06F9/3836

Abstract: Shared local registers for thread team processing is described. An example of an apparatus includes one or more processors including a graphic processor having multiple processing resources; and memory for storage of data, the graphics processor to allocate a first thread team to a first processing resource, the first thread team including hardware threads to be executed solely by the first processing resource; allocate a shared local register (SLR) space that may be directly reference in the ISA instructions to the first processing resource, the SLR space being accessible to the threads of the thread team and being inaccessible to threads outside of the thread team; and allocate individual register spaces to the thread team, each of the individual register spaces being accessible to a respective thread of the thread team.

19.

发明公开
SYNCHRONIZATION UTILIZING LOCAL TEAM BARRIERS FOR THREAD TEAM PROCESSING 审中-公开

公开(公告)号：US20240111609A1

公开(公告)日：2024-04-04

申请号：US17958213

申请日：2022-09-30

Applicant: Intel Corporation

Inventor： Biju George , Supratim Pal , James Valerio , Vasanth Ranganathan , Fangwen Fu , Chunhui Mei

IPC: G06F9/52 , G06F9/30

CPC classification number: G06F9/522 , G06F9/30098

Abstract: Low-latency synchronization utilizing local team barriers for thread team processing is described. An example of an apparatus includes one or more processors including a graphics processor, the graphics processor including a plurality of processing resources; and memory for storage of data including data for graphics processing, wherein the graphics processor is to receive a request for establishment of a local team barrier for a thread team, the thread team being allocated to a first processing resource, the thread team including multiple threads; determine requirements and designated threads for the local team barrier; and establish the local team barrier in a local register of the first processing resource based at least in part on the requirements and designated threads for the local barrier.

20.

发明申请
GATHERING PAYLOAD FROM ARBITRARY REGISTERS FOR SEND MESSAGES IN A GRAPHICS ENVIRONMENT 有权

公开(公告)号：US20230088743A1

公开(公告)日：2023-03-23

申请号：US17481448

申请日：2021-09-22

Applicant: Intel Corporation

Inventor： Supratim Pal , Chandra Gurram , Fan-Yin Tzeng , Subramaniam Maiyuran , Guei-Yuan Lueh , Timothy R. Bauer , Vikranth Vemulapalli , Wei-Yu Chen

IPC: G06F9/30 , G06F9/38 , G06F12/02 , G06T1/20

Abstract: An apparatus to facilitate gathering payload from arbitrary registers for send messages in a graphics environment is disclosed. The apparatus includes processing resources comprising execution circuitry to receive a send gather message instruction identifying a number of registers to access for a send message and identifying IDs of a plurality of individual registers corresponding to the number of registers; decode a first phase of the send gather message instruction; based on decoding the first phase, cause a second phase of the send gather message instruction to bypass an instruction decode stage; and dispatch the first phase subsequently followed by dispatch of the second phase to a send pipeline. The apparatus can also perform an immediate move of the IDs of the plurality of individual registers to an architectural register of the execution circuitry and include a pointer to the architectural register in the send gather message instruction.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification