Patent search ap:("Intel Corporation") AND inv:"Jorge Parra" Page 4

31.

发明公开
GRAPHICS PROCESSORS AND GRAPHICS PROCESSING UNITS HAVING DOT PRODUCT ACCUMULATE INSTRUCTION FOR HYBRID FLOATING POINT FORMAT 审中-公开

公开(公告)号：US20240362180A1

公开(公告)日：2024-10-31

申请号：US18647549

申请日：2024-04-26

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Shubra Marwaha , Ashutosh Garg , Supratim Pal , Jorge Parra , Chandra Gurram , Varghese George , Darin Starkey , Guei-Yuan Lueh

IPC: G06F15/78 , G06F7/544 , G06F7/575 , G06F7/58 , G06F9/30 , G06F9/38 , G06F9/50 , G06F12/02 , G06F12/06 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/80 , G06F17/16 , G06F17/18 , G06N3/08 , G06T1/20 , G06T1/60 , G06T15/06 , H03M7/46

CPC classification number: G06F15/7839 , G06F7/5443 , G06F7/575 , G06F7/588 , G06F9/3001 , G06F9/30014 , G06F9/30036 , G06F9/3004 , G06F9/30043 , G06F9/30047 , G06F9/30065 , G06F9/30079 , G06F9/3887 , G06F9/5011 , G06F9/5077 , G06F12/0215 , G06F12/0238 , G06F12/0246 , G06F12/0607 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/8046 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06F9/3802 , G06F9/3818 , G06F9/3867 , G06F2212/1008 , G06F2212/1021 , G06F2212/1044 , G06F2212/302 , G06F2212/401 , G06F2212/455 , G06F2212/60 , G06N3/08 , G06T15/06

Abstract: Graphics processors and graphics processing units having dot product accumulate instructions for a hybrid floating point format are disclosed. In one embodiment, a graphics multiprocessor comprises an instruction unit to dispatch instructions and a processing resource coupled to the instruction unit. The processing resource is configured to receive a dot product accumulate instruction from the instruction unit and to process the dot product accumulate instruction using a bfloat16 number (BF16) format.

32.

发明公开
SHARED LOCAL REGISTERS FOR THREAD TEAM PROCESSING 审中-公开

公开(公告)号：US20240112295A1

公开(公告)日：2024-04-04

申请号：US17958216

申请日：2022-09-30

Applicant: Intel Corporation

Inventor： Biju George , Fangwen Fu , Supratim Pal , Jorge Parra , Chunhui Mei , Maxim Kazakov , Joydeep Ray

IPC: G06T1/20 , G06F9/30 , G06F9/38

CPC classification number: G06T1/20 , G06F9/30098 , G06F9/3836

Abstract: Shared local registers for thread team processing is described. An example of an apparatus includes one or more processors including a graphic processor having multiple processing resources; and memory for storage of data, the graphics processor to allocate a first thread team to a first processing resource, the first thread team including hardware threads to be executed solely by the first processing resource; allocate a shared local register (SLR) space that may be directly reference in the ISA instructions to the first processing resource, the SLR space being accessible to the threads of the thread team and being inaccessible to threads outside of the thread team; and allocate individual register spaces to the thread team, each of the individual register spaces being accessible to a respective thread of the thread team.

33.

发明申请
REGISTER FILE FOR SYSTOLIC ARRAY 有权

公开(公告)号：US20220413851A1

公开(公告)日：2022-12-29

申请号：US17304794

申请日：2021-06-25

Applicant: Intel Corporation

Inventor： Chandra Gurram , Wei-yu Chen , Fangwen Fu , Sabareesh Ganapathy , Varghese George , Guei-Yuan Lueh , Subramaniam Maiyuran , Mike Macpherson , Supratim Pal , Jorge Parra

IPC: G06F9/30 , G06F17/16 , G06F7/483

Abstract: A processing apparatus includes a general-purpose parallel processing engine including a set of multiple processing elements including a single precision floating-point unit, a double precision floating point unit, and an integer unit; a matrix accelerator including one or more systolic arrays; a first register file coupled with a first read control circuit, wherein the first read control circuit couples with the set of multiple processing elements and the matrix accelerator to arbitrate read requests to the first register file from the set of multiple processing elements and the matrix accelerator; and a second register file coupled with a second read control circuit, wherein the second read control circuit couples with the matrix accelerator to arbitrate read requests to the second register file from the matrix accelerator and limit access to the second register file by the set of multiple processing elements.

34.

发明申请
GRAPHICS PROCESSORS AND GRAPHICS PROCESSING UNITS HAVING DOT PRODUCT ACCUMULATE INSTRUCTION FOR HYBRID FLOATING POINT FORMAT 有权

公开(公告)号：US20220365901A1

公开(公告)日：2022-11-17

申请号：US17827067

申请日：2022-05-27

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Shubra Marwaha , Ashutosh Garg , Supratim Pal , Jorge Parra , Chandra Gurram , Varghese George , Darin Starkey , Guei-Yuan Lueh

IPC: G06F15/78 , G06F9/30 , G06F9/38 , G06F17/18 , G06F12/0802 , G06F7/544 , G06F7/575 , G06F12/02 , G06F12/0866 , G06F12/0875 , G06F12/0895 , G06F12/128 , G06F12/06 , G06F12/1009 , G06T1/20 , G06T1/60 , H03M7/46 , G06F12/0811 , G06F15/80 , G06F17/16 , G06F7/58 , G06F12/0871 , G06F12/0862 , G06F12/0897 , G06F9/50 , G06F12/0804 , G06F12/0882 , G06F12/0891 , G06F12/0893

Abstract: Described herein is a graphics processing unit (GPU) comprising a first processing cluster to perform parallel processing operations, the parallel processing operations including a ray tracing operation and a matrix multiply operation; and a second processing cluster coupled to the first processing cluster, wherein the first processing cluster includes a floating-point unit to perform floating point operations, the floating-point unit is configured to process an instruction using a bfloat16 (BF16) format with a multiplier to multiply second and third source operands while an accumulator adds a first source operand with output from the multiplier.

35.

发明授权
Method and apparatus for approximation using polynomials 有权

公开(公告)号：US11327754B2

公开(公告)日：2022-05-10

申请号：US16366941

申请日：2019-03-27

Applicant: Intel Corporation

Inventor： Jorge Parra , Dan Baum , Robert S. Chappell , Michael Espig , Varghese George , Alexander Heinecke , Christopher Hughes , Subramaniam Maiyuran , Prasoonkumar Surti , Ronen Zohar , Elmoustapha Ould-Ahmed-Vall

IPC: G06F9/30 , G06F17/11 , G06F7/544 , G06F9/38 , G06F7/552

Abstract: Methods and apparatus for approximation using polynomial functions are disclosed. In one embodiment, a processor comprises decoding and execution circuitry. The decoding circuitry is to decode an instruction, where the instruction comprises a first operand specifying an output location and a second operand specifying a plurality of data element values to be computed. The execution circuitry is to execute the decoded instruction. The execution includes to compute a result for each of the plurality of data element values using a polynomial function to approximate a complex function, where the computation uses coefficients stored in a lookup location for the complex function, and where data element values within different data element value ranges use different sets of coefficients. The execution further includes to store results of the computation in the output location.

36.

发明申请
GRAPHICS PROCESSORS AND GRAPHICS PROCESSING UNITS HAVING DOT PRODUCT ACCUMULATE INSTRUCTION FOR HYBRID FLOATING POINT FORMAT 有权

公开(公告)号：US20220129266A1

公开(公告)日：2022-04-28

申请号：US17428523

申请日：2020-03-14

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Shubra Marwaha , Ashutosh Garg , Supratim Pal , Jorge Parra , Chandra Gurram , Varghese George , Darin Starkey , Guei-Yuan Lueh

IPC: G06F9/30 , G06F7/544 , G06F12/02 , G06F12/0811 , G06F12/0875

Abstract: Graphics processors and graphics processing units having dot product accumulate instructions for a hybrid floating point format are disclosed. In one embodiment, a graphics multiprocessor comprises an instruction unit to dispatch instructions and
a processing resource coupled to the instruction unit. The processing resource is configured to receive a dot product accumulate instruction from the instruction unit and to process the dot product accumulate instruction using a bfloat16 number (BF16) format.

37.

发明申请
COMPUTING EFFICIENT CROSS CHANNEL OPERATIONS IN PARALLEL COMPUTING MACHINES USING SYSTOLIC ARRAYS 有权

公开(公告)号：US20220058158A1

公开(公告)日：2022-02-24

申请号：US17518202

申请日：2021-11-03

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Jorge Parra , Supratim Pal , Chandra Gurram

IPC: G06F15/80

Abstract: An apparatus to facilitate computing efficient cross channel operations in parallel computing machines using systolic arrays is disclosed. The apparatus includes a plurality of registers and one or more processing elements communicably coupled to the plurality of registers. The one or more processing elements include a systolic array circuit to perform cross-channel operations on source data received from a single source register of the plurality of registers, wherein the systolic array circuit is modified to: receive inputs from the single source register at different stages of the systolic array circuit; perform cross-channel operations at channels of the systolic array circuit; bypass disabled channels of the systolic array circuit, the disabled channels not used to compute the cross-channel operations; and broadcast a final result of a final stage of the systolic array circuit to all channels of a destination register.

38.

发明申请
GRAPHICS PROCESSORS AND GRAPHICS PROCESSING UNITS HAVING DOT PRODUCT ACCUMULATE INSTRUCTION FOR HYBRID FLOATING POINT FORMAT 有权

公开(公告)号：US20210312697A1

公开(公告)日：2021-10-07

申请号：US17304092

申请日：2021-06-14

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Shubra Marwaha , Ashutosh Garg , Supratim Pal , Jorge Parra , Chandra Gurram , Varghese George , Darin Starkey , Guei-Yuan Lueh

IPC: G06T15/06 , G06F9/38 , G06F17/18 , G06F9/30

Abstract: Described herein is a graphics processing unit (GPU) comprising a single instruction, multiple thread (SIMT) multiprocessor comprising an instruction cache, a shared memory coupled with the instruction cache, and circuitry coupled with the shared memory and the instruction cache, the circuitry including multiple texture units, a first core including hardware to accelerate matrix operations, and a second core configured to receive an instruction having multiple operands in a bfloat16 (BF16) number format, wherein the multiple operands include a first source operand, a second source operand, and a third source operand, and the BF16 number format is a sixteen-bit floating point format having an eight-bit exponent and process the instruction, wherein to process the instruction includes to multiply the second source operand by the third source operand and add a first source operand to a result of the multiply.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification