Patent search ap:("INTEL CORPORATION") AND inv:"Chandra Gurram" Page 4

31.

发明公开
GRAPHICS PROCESSORS AND GRAPHICS PROCESSING UNITS HAVING DOT PRODUCT ACCUMULATE INSTRUCTION FOR HYBRID FLOATING POINT FORMAT 审中-公开

公开(公告)号：US20240362180A1

公开(公告)日：2024-10-31

申请号：US18647549

申请日：2024-04-26

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Shubra Marwaha , Ashutosh Garg , Supratim Pal , Jorge Parra , Chandra Gurram , Varghese George , Darin Starkey , Guei-Yuan Lueh

IPC: G06F15/78 , G06F7/544 , G06F7/575 , G06F7/58 , G06F9/30 , G06F9/38 , G06F9/50 , G06F12/02 , G06F12/06 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/80 , G06F17/16 , G06F17/18 , G06N3/08 , G06T1/20 , G06T1/60 , G06T15/06 , H03M7/46

CPC classification number: G06F15/7839 , G06F7/5443 , G06F7/575 , G06F7/588 , G06F9/3001 , G06F9/30014 , G06F9/30036 , G06F9/3004 , G06F9/30043 , G06F9/30047 , G06F9/30065 , G06F9/30079 , G06F9/3887 , G06F9/5011 , G06F9/5077 , G06F12/0215 , G06F12/0238 , G06F12/0246 , G06F12/0607 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/8046 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06F9/3802 , G06F9/3818 , G06F9/3867 , G06F2212/1008 , G06F2212/1021 , G06F2212/1044 , G06F2212/302 , G06F2212/401 , G06F2212/455 , G06F2212/60 , G06N3/08 , G06T15/06

Abstract: Graphics processors and graphics processing units having dot product accumulate instructions for a hybrid floating point format are disclosed. In one embodiment, a graphics multiprocessor comprises an instruction unit to dispatch instructions and a processing resource coupled to the instruction unit. The processing resource is configured to receive a dot product accumulate instruction from the instruction unit and to process the dot product accumulate instruction using a bfloat16 number (BF16) format.

32.

发明申请
GATHERING PAYLOAD FROM ARBITRARY REGISTERS FOR SEND MESSAGES IN A GRAPHICS ENVIRONMENT 有权

公开(公告)号：US20230088743A1

公开(公告)日：2023-03-23

申请号：US17481448

申请日：2021-09-22

Applicant: Intel Corporation

Inventor： Supratim Pal , Chandra Gurram , Fan-Yin Tzeng , Subramaniam Maiyuran , Guei-Yuan Lueh , Timothy R. Bauer , Vikranth Vemulapalli , Wei-Yu Chen

IPC: G06F9/30 , G06F9/38 , G06F12/02 , G06T1/20

Abstract: An apparatus to facilitate gathering payload from arbitrary registers for send messages in a graphics environment is disclosed. The apparatus includes processing resources comprising execution circuitry to receive a send gather message instruction identifying a number of registers to access for a send message and identifying IDs of a plurality of individual registers corresponding to the number of registers; decode a first phase of the send gather message instruction; based on decoding the first phase, cause a second phase of the send gather message instruction to bypass an instruction decode stage; and dispatch the first phase subsequently followed by dispatch of the second phase to a send pipeline. The apparatus can also perform an immediate move of the IDs of the plurality of individual registers to an architectural register of the execution circuitry and include a pointer to the architectural register in the send gather message instruction.

33.

发明申请
REGISTER FILE FOR SYSTOLIC ARRAY 有权

公开(公告)号：US20220413851A1

公开(公告)日：2022-12-29

申请号：US17304794

申请日：2021-06-25

Applicant: Intel Corporation

Inventor： Chandra Gurram , Wei-yu Chen , Fangwen Fu , Sabareesh Ganapathy , Varghese George , Guei-Yuan Lueh , Subramaniam Maiyuran , Mike Macpherson , Supratim Pal , Jorge Parra

IPC: G06F9/30 , G06F17/16 , G06F7/483

Abstract: A processing apparatus includes a general-purpose parallel processing engine including a set of multiple processing elements including a single precision floating-point unit, a double precision floating point unit, and an integer unit; a matrix accelerator including one or more systolic arrays; a first register file coupled with a first read control circuit, wherein the first read control circuit couples with the set of multiple processing elements and the matrix accelerator to arbitrate read requests to the first register file from the set of multiple processing elements and the matrix accelerator; and a second register file coupled with a second read control circuit, wherein the second read control circuit couples with the matrix accelerator to arbitrate read requests to the second register file from the matrix accelerator and limit access to the second register file by the set of multiple processing elements.

34.

发明申请
GRAPHICS PROCESSORS AND GRAPHICS PROCESSING UNITS HAVING DOT PRODUCT ACCUMULATE INSTRUCTION FOR HYBRID FLOATING POINT FORMAT 有权

公开(公告)号：US20220365901A1

公开(公告)日：2022-11-17

申请号：US17827067

申请日：2022-05-27

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Shubra Marwaha , Ashutosh Garg , Supratim Pal , Jorge Parra , Chandra Gurram , Varghese George , Darin Starkey , Guei-Yuan Lueh

IPC: G06F15/78 , G06F9/30 , G06F9/38 , G06F17/18 , G06F12/0802 , G06F7/544 , G06F7/575 , G06F12/02 , G06F12/0866 , G06F12/0875 , G06F12/0895 , G06F12/128 , G06F12/06 , G06F12/1009 , G06T1/20 , G06T1/60 , H03M7/46 , G06F12/0811 , G06F15/80 , G06F17/16 , G06F7/58 , G06F12/0871 , G06F12/0862 , G06F12/0897 , G06F9/50 , G06F12/0804 , G06F12/0882 , G06F12/0891 , G06F12/0893

Abstract: Described herein is a graphics processing unit (GPU) comprising a first processing cluster to perform parallel processing operations, the parallel processing operations including a ray tracing operation and a matrix multiply operation; and a second processing cluster coupled to the first processing cluster, wherein the first processing cluster includes a floating-point unit to perform floating point operations, the floating-point unit is configured to process an instruction using a bfloat16 (BF16) format with a multiplier to multiply second and third source operands while an accumulator adds a first source operand with output from the multiplier.

35.

发明申请
GRAPHICS PROCESSORS AND GRAPHICS PROCESSING UNITS HAVING DOT PRODUCT ACCUMULATE INSTRUCTION FOR HYBRID FLOATING POINT FORMAT 有权

公开(公告)号：US20220129266A1

公开(公告)日：2022-04-28

申请号：US17428523

申请日：2020-03-14

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Shubra Marwaha , Ashutosh Garg , Supratim Pal , Jorge Parra , Chandra Gurram , Varghese George , Darin Starkey , Guei-Yuan Lueh

IPC: G06F9/30 , G06F7/544 , G06F12/02 , G06F12/0811 , G06F12/0875

Abstract: Graphics processors and graphics processing units having dot product accumulate instructions for a hybrid floating point format are disclosed. In one embodiment, a graphics multiprocessor comprises an instruction unit to dispatch instructions and
a processing resource coupled to the instruction unit. The processing resource is configured to receive a dot product accumulate instruction from the instruction unit and to process the dot product accumulate instruction using a bfloat16 number (BF16) format.

36.

发明申请
COMPUTING EFFICIENT CROSS CHANNEL OPERATIONS IN PARALLEL COMPUTING MACHINES USING SYSTOLIC ARRAYS 有权

公开(公告)号：US20220058158A1

公开(公告)日：2022-02-24

申请号：US17518202

申请日：2021-11-03

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Jorge Parra , Supratim Pal , Chandra Gurram

IPC: G06F15/80

Abstract: An apparatus to facilitate computing efficient cross channel operations in parallel computing machines using systolic arrays is disclosed. The apparatus includes a plurality of registers and one or more processing elements communicably coupled to the plurality of registers. The one or more processing elements include a systolic array circuit to perform cross-channel operations on source data received from a single source register of the plurality of registers, wherein the systolic array circuit is modified to: receive inputs from the single source register at different stages of the systolic array circuit; perform cross-channel operations at channels of the systolic array circuit; bypass disabled channels of the systolic array circuit, the disabled channels not used to compute the cross-channel operations; and broadcast a final result of a final stage of the systolic array circuit to all channels of a destination register.

37.

发明申请
COMPACTION OF DIVERGED LANES FOR EFFICIENT USE OF ALUS 有权

公开(公告)号：US20210349717A1

公开(公告)日：2021-11-11

申请号：US16914030

申请日：2020-06-26

Applicant: Intel Corporation

Inventor： Chandra Gurram , Subramaniam Maiyuran , Supratim Pal , Saurabh Sharma , Aditya Navale

IPC: G06F9/30 , G06F9/38 , G06F7/57 , G06F1/14

Abstract: Described herein is an accelerator device in which compaction of diverged lanes of a parallel processor is enabled to increase the efficiency of ALU utilization. One embodiment provides an accelerator device comprising a host interface, a fabric interconnect coupled with the host interface, and one or more hardware tiles coupled with the fabric interconnect, the one or more hardware tiles including a parallel processing architecture configured to enable compaction of diverged lanes.

38.

发明授权
Systems and methods for reducing register bank conflicts based on a software hint bit causing a hardware thread switch 有权

公开(公告)号：US11163578B2

公开(公告)日：2021-11-02

申请号：US15903549

申请日：2018-02-23

Applicant: Intel Corporation

Inventor： Buqi Cheng , Wei-Yu Chen , Guei-Yuan Lueh , Chandra Gurram , Subramaniam Maiyuran

IPC: G06F9/38 , G06F8/41 , G06F9/30

Abstract: Mechanisms for reducing register bank conflicts based on software hint and hardware thread switch are disclosed. In some embodiments, an apparatus for thread switching includes a graphics processing unit (GPU) that includes a plurality of register banks to store operands that are assigned at least partially to avoid register bank conflicts. Decoding circuitry checks a thread switching field of a first instruction to be executed by a first thread. The GPU performs a thread switch mechanism to cause a second instruction to be executed by a second thread when the thread switching field of the first instruction is set.

39.

发明申请
GRAPHICS PROCESSORS AND GRAPHICS PROCESSING UNITS HAVING DOT PRODUCT ACCUMULATE INSTRUCTION FOR HYBRID FLOATING POINT FORMAT 有权

公开(公告)号：US20210312697A1

公开(公告)日：2021-10-07

申请号：US17304092

申请日：2021-06-14

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Shubra Marwaha , Ashutosh Garg , Supratim Pal , Jorge Parra , Chandra Gurram , Varghese George , Darin Starkey , Guei-Yuan Lueh

IPC: G06T15/06 , G06F9/38 , G06F17/18 , G06F9/30

Abstract: Described herein is a graphics processing unit (GPU) comprising a single instruction, multiple thread (SIMT) multiprocessor comprising an instruction cache, a shared memory coupled with the instruction cache, and circuitry coupled with the shared memory and the instruction cache, the circuitry including multiple texture units, a first core including hardware to accelerate matrix operations, and a second core configured to receive an instruction having multiple operands in a bfloat16 (BF16) number format, wherein the multiple operands include a first source operand, a second source operand, and a third source operand, and the BF16 number format is a sixteen-bit floating point format having an eight-bit exponent and process the instruction, wherein to process the instruction includes to multiply the second source operand by the third source operand and add a first source operand to a result of the multiply.

40.

发明申请
RESOURCE LOAD BALANCING BASED ON USAGE AND POWER LIMITS 审中-公开

公开(公告)号：US20190204894A1

公开(公告)日：2019-07-04

申请号：US15859598

申请日：2017-12-31

Applicant: Intel Corporation

Inventor： Sanjeev Jahagirdar , Altug Koker , Yoav Harel , Kenneth Brand , Chandra Gurram , Eric Finley , Bhushan Borole , Carlos Nava Rodriguez

IPC: G06F1/32

CPC classification number: G06F1/324

Abstract: Methods and apparatus relating to techniques for resource load balancing based on usage and/or power limits are described. In an embodiment, resource load balancing logic causes a first resource of a processor to operate at a first frequency and a second resource of the processor to operate at a second frequency. Memory stores a plurality of frequency values. The resource load balancing logic also selects the first frequency and the second frequency based on the stored plurality of frequency values. Operation of the first resource at the first frequency and the second resource at the second frequency in turn causes the processor to operate under a power budget. The resource load balancing logic causes change to the first frequency and the second frequency in response to a determination that operation of the processor is different than the power budget. Other embodiments are also disclosed and claimed.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification