Patent search ap:("INTEL CORPORATION") AND inv:"Supratim Pal" Page 4

31.

发明授权
Dual pipeline parallel systolic array 有权

公开(公告)号：US12189571B2

公开(公告)日：2025-01-07

申请号：US17304797

申请日：2021-06-25

Applicant: Intel Corporation

Inventor： Jorge Parra , Jiasheng Chen , Supratim Pal , Fangwen Fu , Sabareesh Ganapathy , Chandra Gurram , Chunhui Mei , Yue Qi

IPC: G06F15/80 , G06F9/30 , G06F9/38

Abstract: A processing apparatus described herein includes a general-purpose parallel processing engine comprising a systolic array having multiple pipelines, each of the multiple pipelines including multiple pipeline stages, wherein the multiple pipelines include a first pipeline, a second pipeline, and a common input shared between the first pipeline and the second pipeline.

32.

发明授权
Computing efficient cross channel operations in parallel computing machines using systolic arrays 有权

公开(公告)号：US12093213B2

公开(公告)日：2024-09-17

申请号：US18310129

申请日：2023-05-01

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Jorge Parra , Supratim Pal , Chandra Gurram

IPC: G06F15/80 , G06F17/16 , G06N20/00

CPC classification number: G06F15/8046 , G06F15/8007 , G06F17/16 , G06N20/00

Abstract: An apparatus to facilitate computing efficient cross channel operations in parallel computing machines using systolic arrays is disclosed. The apparatus includes a plurality of registers and one or more processing elements communicably coupled to the plurality of registers. The one or more processing elements include a systolic array circuit to perform cross-channel operations on source data received from a single source register of the plurality of registers, wherein the systolic array circuit is modified to: receive inputs from the single source register at different stages of the systolic array circuit; perform cross-channel operations at channels of the systolic array circuit; bypass disabled channels of the systolic array circuit, the disabled channels not used to compute the cross-channel operations; and broadcast a final result of a final stage of the systolic array circuit to all channels of a destination register.

33.

发明授权
Graphics processors and graphics processing units having dot product accumulate instruction for hybrid floating point format 有权

公开(公告)号：US12007935B2

公开(公告)日：2024-06-11

申请号：US17428523

申请日：2020-03-14

Applicant: INTEL CORPORATION

Inventor： Subramaniam Maiyuran , Shubra Marwaha , Ashutosh Garg , Supratim Pal , Jorge Parra , Chandra Gurram , Varghese George , Darin Starkey , Guei-Yuan Lueh

IPC: G06F9/30 , G06F7/544 , G06F7/575 , G06F7/58 , G06F9/38 , G06F9/50 , G06F12/02 , G06F12/06 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/78 , G06F15/80 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06N3/08 , G06T15/06

CPC classification number: G06F15/7839 , G06F7/5443 , G06F7/575 , G06F7/588 , G06F9/3001 , G06F9/30014 , G06F9/30036 , G06F9/3004 , G06F9/30043 , G06F9/30047 , G06F9/30065 , G06F9/30079 , G06F9/3887 , G06F9/5011 , G06F9/5077 , G06F12/0215 , G06F12/0238 , G06F12/0246 , G06F12/0607 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/8046 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06F9/3802 , G06F9/3818 , G06F9/3867 , G06F2212/1008 , G06F2212/1021 , G06F2212/1044 , G06F2212/302 , G06F2212/401 , G06F2212/455 , G06F2212/60 , G06N3/08 , G06T15/06

Abstract: Graphics processors and graphics processing units having dot product accumulate instructions for a hybrid floating point format are disclosed. In one embodiment, a graphics multiprocessor comprises an instruction unit to dispatch instructions and
a processing resource coupled to the instruction unit. The processing resource is configured to receive a dot product accumulate instruction from the instruction unit and to process the dot product accumulate instruction using a bfloat16 number (BF16) format.

34.

发明公开
ENHANCEMENTS FOR ACCUMULATOR USAGE AND INSTRUCTION FORWARDING IN MATRIX MULTIPLY PIPELINE IN GRAPHICS ENVIRONMENT 审中-公开

公开(公告)号：US20240169021A1

公开(公告)日：2024-05-23

申请号：US18056930

申请日：2022-11-18

Applicant: Intel Corporation

Inventor： Jorge Eduardo Parra Osorio , Supratim Pal , Fangwen Fu , Guei-Yuan Lueh , Po-Yu Chen , Jiasheng Chen

IPC: G06F17/16 , G06F7/544

CPC classification number: G06F17/16 , G06F7/5443

Abstract: An apparatus to facilitate enhancements for accumulator usage and instruction forwarding in matrix multiply pipeline in graphics environment is disclosed. The apparatus includes matrix acceleration hardware comprising a plurality of data processing units, wherein the respective plurality of data processing units comprise: multiply-accumulate hardware to generate intermediate results of a matrix multiplication operation; intermediate accumulation hardware to store the intermediate results of the matrix multiplication operation and accumulate with other intermediate results generated by the multiply-accumulate hardware; a bypass data structure to cause a source operand to bypass the multiply-accumulate hardware; and an adder circuit to add an output from the multiply-accumulate hardware with at least one of the source operand or an output of the intermediate accumulation hardware to generate a final output.

35.

发明公开
SUPPORTING AND LOAD BALANCING MULTIPLE DOUBLE PRECISION PIPELINES IN A GRAPHICS ENVIRONMENT 审中-公开

公开(公告)号：US20240168764A1

公开(公告)日：2024-05-23

申请号：US18056820

申请日：2022-11-18

Applicant: Intel Corporation

Inventor： Supratim Pal , Jiasheng Chen , Vikranth Vemulapalli , Subramaniam Maiyuran

IPC: G06F9/30 , G06F9/38

CPC classification number: G06F9/30014 , G06F9/3867

Abstract: An apparatus to facilitate supporting and load balancing multiple double precision pipelines in a graphics environment is disclosed. The apparatus includes a processing core having at least one processing resource comprising: a first double precision (DP) pipeline to support double float operations, the first DP pipeline comprising a first set of floating point units (FPUs) configured in a pipelined configuration to enable new instructions to be issued to the first DP pipeline before previous instructions are complete; and a second DP pipeline to support the double float operations, wherein the second DP pipeline comprising a second set of FPUs configured in a pipelined configuration to enable new instructions to be issued to the first DP pipeline before previous instructions are complete.

36.

发明公开
MATRIX TRANSPOSITION IN MATRIX MULTIPLICATION ARRAY CIRCUITRY 审中-公开

公开(公告)号：US20240168723A1

公开(公告)日：2024-05-23

申请号：US18056822

申请日：2022-11-18

Applicant: Intel Corporation

Inventor： Jorge Eduardo Parra Osorio , Supratim Pal , Jiasheng Chen

IPC: G06F7/78 , G06F17/16

CPC classification number: G06F7/78 , G06F17/16

Abstract: An apparatus to facilitate matrix transposition in matrix multiplication array circuitry is disclosed. The apparatus includes a processor comprising matrix acceleration hardware comprising storage buffers and an array of data processing units (DPUs), wherein the matrix acceleration hardware is to: load data for a source matrix to the storage buffers; generate a transposed matrix corresponding comprising transposed elements of the source matrix; and input the transposed matrix to the array of DPUs for a matrix multiplication operation.

37.

发明授权
Graphics processors and graphics processing units having dot product accumulate instruction for hybrid floating point format 有权

公开(公告)号：US11954063B2

公开(公告)日：2024-04-09

申请号：US18170900

申请日：2023-02-17

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Shubra Marwaha , Ashutosh Garg , Supratim Pal , Jorge Parra , Chandra Gurram , Varghese George , Darin Starkey , Guei-Yuan Lueh

IPC: G06T15/06 , G06F7/544 , G06F7/575 , G06F7/58 , G06F9/30 , G06F9/38 , G06F9/50 , G06F12/02 , G06F12/06 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/78 , G06F15/80 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06N3/08

CPC classification number: G06F15/7839 , G06F7/5443 , G06F7/575 , G06F7/588 , G06F9/3001 , G06F9/30014 , G06F9/30036 , G06F9/3004 , G06F9/30043 , G06F9/30047 , G06F9/30065 , G06F9/30079 , G06F9/3887 , G06F9/5011 , G06F9/5077 , G06F12/0215 , G06F12/0238 , G06F12/0246 , G06F12/0607 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/8046 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06F9/3802 , G06F9/3818 , G06F9/3867 , G06F2212/1008 , G06F2212/1021 , G06F2212/1044 , G06F2212/302 , G06F2212/401 , G06F2212/455 , G06F2212/60 , G06N3/08 , G06T15/06

Abstract: Described herein is a graphics processing unit (GPU) configured to receive an instruction having multiple operands, where the instruction is a single instruction multiple data (SIMD) instruction configured to use a bfloat16 (BF16) number format and the BF16 number format is a sixteen-bit floating point format having an eight-bit exponent. The GPU can process the instruction using the multiple operands, where to process the instruction includes to perform a multiply operation, perform an addition to a result of the multiply operation, and apply a rectified linear unit function to a result of the addition.

38.

发明授权
Instructions and logic for vector multiply add with zero skipping 有权

公开(公告)号：US11669329B2

公开(公告)日：2023-06-06

申请号：US17723312

申请日：2022-04-18

Applicant: Intel Corporation

Inventor： Supratim Pal , Sasikanth Avancha , Ishwar Bhati , Wei-Yu Chen , Dipankar Das , Ashutosh Garg , Chandra S. Gurram , Junjie Gu , Guei-Yuan Lueh , Subramaniam Maiyuran , Jorge E. Parra , Sudarshan Srinivasan , Varghese George

IPC: G06F9/38 , G06F9/30

CPC classification number: G06F9/3802 , G06F9/3001 , G06F9/30018 , G06F9/30145

Abstract: Embodiments described herein provide for an instruction and associated logic to enable a vector multiply add instructions with automatic zero skipping for sparse input. One embodiment provides for a general-purpose graphics processor comprising logic to perform operations comprising fetching a hardware macro instruction having a predicate mask, a repeat count, and a set of initial operands, where the initial operands include a destination operand and multiple source operands. The hardware macro instruction is configured to perform one or more multiply/add operations on input data associated with a set of matrices.

39.

发明申请
USE OF A SINGLE INSTRUCTION SET ARCHITECTURE (ISA) INSTRUCTION FOR VECTOR NORMALIZATION 有权

公开(公告)号：US20220147316A1

公开(公告)日：2022-05-12

申请号：US17477939

申请日：2021-09-17

Applicant: Intel Corporation

Inventor： Abhishek Rhisheekesan , Supratim Pal , Shashank Lakshminarayana , Subramaniam Maiyuran

IPC: G06F7/552 , G06F9/30

Abstract: Embodiments described herein are generally directed to an improved vector normalization instruction. An embodiment of a method includes responsive to receipt by a GPU of a single instruction specifying a vector normalization operation to be performed on V vectors: (i) generating V squared length values, N at a time, by a first processing unit, by, for each N sets of inputs, each representing multiple component vectors for N of the vectors, performing N parallel dot product operations on the N sets of inputs. Generating V sets of outputs representing multiple normalized component vectors of the V vectors, N at a time, by a second processing unit, by, for each N squared length values of the V squared length values, performing N parallel operations on the N squared length values, wherein each of the N parallel operations implement a combination of a reciprocal square root function and a vector scaling function.

40.

发明授权
Scalable sparse matrix multiply acceleration using systolic arrays with feedback inputs 有权

公开(公告)号：US11204977B2

公开(公告)日：2021-12-21

申请号：US16913800

申请日：2020-06-26

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Jorge Parra , Supratim Pal , Ashutosh Garg , Shubra Marwaha , Chandra Gurram , Darin Starkey , Durgesh Borkar , Varghese George

IPC: G06F17/16 , G06F15/80 , G06F9/30

Abstract: Described herein is an accelerator device including a host interface, a fabric interconnect coupled with the host interface, and one or more hardware tiles coupled with the fabric interconnect, the one or more hardware tiles including sparse matrix multiply acceleration hardware including a systolic array with feedback inputs.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification