Patent search ap:("INTEL CORPORATION") AND inv:"Supratim Pal" Page 9

81.

发明申请
DUAL PIPELINE PARALLEL SYSTOLIC ARRAY 有权

公开(公告)号：US20220414054A1

公开(公告)日：2022-12-29

申请号：US17304797

申请日：2021-06-25

Applicant: Intel Corporation

Inventor： Jorge Parra , Jiasheng Chen , Supratim Pal , Fangwen Fu , Sabareesh Ganapathy , Chandra Gurram , Chunhui Mei , Yue Qi

IPC: G06F15/80 , G06F9/38

Abstract: A processing apparatus described herein includes a general-purpose parallel processing engine comprising a systolic array having multiple pipelines, each of the multiple pipelines including multiple pipeline stages, wherein the multiple pipelines include a first pipeline, a second pipeline, and a common input shared between the first pipeline and the second pipeline.

82.

发明申请
SYSTOLIC ARRAY OF ARBITRARY PHYSICAL AND LOGICAL DEPTH 有权

公开(公告)号：US20220414053A1

公开(公告)日：2022-12-29

申请号：US17304678

申请日：2021-06-24

Applicant: Intel Corporation

Inventor： Jorge Parra , Wei-yu Chen , Kaiyu Chen , Varghese George , Junjie Gu , Chandra Gurram , Guei-Yuan Lueh , Stephen Junkins , Subramaniam Maiyuran , Supratim Pal

IPC: G06F15/80 , G06F9/50 , G06F9/54 , G06T1/20

Abstract: A processing apparatus includes a processing resource including a general-purpose parallel processing engine and a matrix accelerator. The matrix accelerator includes first circuitry to receive a command to perform operations associated with an instruction, second circuitry to configure the matrix accelerator according to a physical depth of a systolic array within the matrix accelerator and a logical depth associated with the instruction, third circuitry to read operands for the instruction from a register file associated with the systolic array, fourth circuitry to perform operations for the instruction via one or more passes through one or more physical pipeline stages of the systolic array based on a configuration performed by the second circuitry, and fifth circuitry to write output of the operations to the register file associated with the systolic array.

83.

发明申请
MULTIPLE REGISTER ALLOCATION SIZES FOR THREADS 有权

公开(公告)号：US20220413916A1

公开(公告)日：2022-12-29

申请号：US17358650

申请日：2021-06-25

Applicant: Intel Corporation

Inventor： Chandra Gurram , Wei-Yu Chen , Vikranth Vemulapalli , Subramaniam Maiyuran , Jorge Eduardo Parra Osorio , Shuai Mu , Guei-Yuan Lueh , Supratim Pal

IPC: G06F9/50 , G06F9/48 , G06T1/20

Abstract: Provision of multiple register allocation sizes for threads is described. An example of a system includes one or more processors including a graphics processor, the graphics processor including at least a first local thread dispatcher (TDL) and multiple processing resources, each processing resource including a plurality of registers; and memory for storage of data for processing, wherein the one or more processors are to determine a register size for a first thread; identify one or more processing resources having sufficient register space for the first thread; select a processing resource of the one or more processing resources having sufficient register space to assign the first thread; select an available thread slot of the selected processing resource for the first thread; and allocate registers of the selected processing resource for the first thread.

84.

发明授权
Control flow mechanism for execution of graphics processor instructions using active channel packing 有权

公开(公告)号：US11537403B2

公开(公告)日：2022-12-27

申请号：US17213453

申请日：2021-03-26

Applicant: Intel Corporation

Inventor： Subramaniam M. Maiyuran , Guei-Yuan Lueh , Supratim Pal , Gang Chen , Ananda V. Kommaraju , Joy Chandra , Altug Koker , Prasoonkumar Surti , David Puffer , Hong Bin Liao , Joydeep Ray , Abhishek R. Appu , Ankur N. Shah , Travis T. Schluessler , Jonathan Kennedy , Devan Burke

IPC: G06F9/38 , G06F9/30 , G06T1/20

Abstract: An apparatus to facilitate control flow in a graphics processing system is disclosed. The apparatus includes logic a plurality of execution units to execute single instruction, multiple data (SIMD) and flow control logic to detect a diverging control flow in a plurality of SIMD channels and reduce the execution of the control flow to a subset of the SIMD channels.

85.

发明申请
INSTRUCTIONS AND LOGIC FOR VECTOR MULTIPLY ADD WITH ZERO SKIPPING 有权

公开(公告)号：US20220326953A1

公开(公告)日：2022-10-13

申请号：US17723312

申请日：2022-04-18

Applicant: Intel Corporation

Inventor： Supratim Pal , Sasikanth Avancha , Ishwar Bhati , Wei-Yu Chen , Dipankar Das , Ashutosh Garg , Chandra S. Gurram , Junjie Gu , Guei-Yuan Lueh , Subramaniam Maiyuran , Jorge E. Parra , Sudarshan Srinivasan , Varghese George

IPC: G06F9/38 , G06F9/30

Abstract: Embodiments described herein provide for an instruction and associated logic to enable a vector multiply add instructions with automatic zero skipping for sparse input. One embodiment provides for a general-purpose graphics processor comprising logic to perform operations comprising fetching a hardware macro instruction having a predicate mask, a repeat count, and a set of initial operands, where the initial operands include a destination operand and multiple source operands. The hardware macro instruction is configured to perform one or more multiply/add operations on input data associated with a set of matrices.

86.

发明申请
SUPPORTING 8-BIT FLOATING POINT FORMAT OPERANDS IN A COMPUTING ARCHITECTURE 有权

公开(公告)号：US20220318013A1

公开(公告)日：2022-10-06

申请号：US17212588

申请日：2021-03-25

Applicant: Intel Corporation

Inventor： Naveen Mellempudi , Subramaniam Maiyuran , Varghese George , Fangwen Fu , Shuai Mu , Supratim Pal , Wei Xiong

IPC: G06F9/30 , G06F9/38 , G06F9/48

Abstract: An apparatus to facilitate supporting 8-bit floating point format operands in a computing architecture is disclosed. The apparatus includes a processor comprising: a decoder to decode an instruction fetched for execution into a decoded instruction, wherein the decoded instruction is a matrix instruction that operates on 8-bit floating point operands to cause the processor to perform a parallel dot product operation; a controller to schedule the decoded instruction and provide input data for the 8-bit floating point operands in accordance with an 8-bit floating data format indicated by the decoded instruction; and systolic dot product circuitry to execute the decoded instruction using systolic layers, each systolic layer comprises one or more sets of interconnected multipliers, shifters, and adder, each set of multipliers, shifters, and adders to generate a dot product of the 8-bit floating point operands.

87.

发明授权
Sparse matrix optimization mechanism 有权

公开(公告)号：US11443407B2

公开(公告)日：2022-09-13

申请号：US17465821

申请日：2021-09-02

Applicant: Intel Corporation

Inventor： Namita Sharma , Supratim Pal , Biju P. Simon , Tovinakere D. Vivek

IPC: G06T1/60 , G06T1/20

Abstract: An apparatus to facilitate matrix processing is disclosed. The apparatus comprises a matrix accelerator to receive input matrix data, transform the input matrix data into a plurality of sub-blocks, examine a first block of the sub-blocks to determine whether the first block comprises sparse data, select a first tile size upon a determination that the first block comprises sparse data and generate output matrix data based on the first tile size.

88.

发明授权
Compiler assisted register file write reduction 有权

公开(公告)号：US11321799B2

公开(公告)日：2022-05-03

申请号：US16726659

申请日：2019-12-24

Applicant: Intel Corporation

Inventor： Chandra S. Gurram , Gang Y. Chen , Subramaniam Maiyuran , Supratim Pal , Ashutosh Garg , Jorge E. Parra , Darin M. Starkey , Guei-Yuan Lueh , Wei-Yu Chen

IPC: G06T1/20 , G06T1/60

Abstract: Examples described herein relate to a software and hardware optimization that manages scenarios where a write operation to a register is less than an entirety of the register. A compiler detects instructions that make partial writes to the same register, groups such instructions, and provides hints to hardware of the partial write. The execution unit combines the output data for grouped instructions and updates the destination register as single write instead of multiple separate partial writes.

89.

发明申请
SPARSE MATRIX OPTIMIZATION MECHANISM 有权

公开(公告)号：US20220092723A1

公开(公告)日：2022-03-24

申请号：US17465821

申请日：2021-09-02

Applicant: Intel Corporation

Inventor： Namita Sharma , Supratim Pal , Biju P. Simon , Tovinakere D. Vivek

IPC: G06T1/20

Abstract: An apparatus to facilitate matrix processing is disclosed. The apparatus comprises a matrix accelerator to receive input matrix data, transform the input matrix data into a plurality of sub-blocks, examine a first block of the sub-blocks to determine whether the first block comprises sparse data, select a first tile size upon a determination that the first block comprises sparse data and generate output matrix data based on the first tile size.

90.

发明申请
COMPUTING EFFICIENT CROSS CHANNEL OPERATIONS IN PARALLEL COMPUTING MACHINES USING SYSTOLIC ARRAYS 有权

公开(公告)号：US20210365402A1

公开(公告)日：2021-11-25

申请号：US16900236

申请日：2020-06-12

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Jorge Parra , Supratim Pal , Chandra Gurram

IPC: G06F15/80

Abstract: An apparatus to facilitate computing efficient cross channel operations in parallel computing machines using systolic arrays is disclosed. The apparatus includes a plurality of registers and one or more processing elements communicably coupled to the plurality of registers. The one or more processing elements include a systolic array circuit to perform cross-channel operations on source data received from a single source register of the plurality of registers, the systolic array circuit modified to receive inputs from the single source register and route elements of the single source register to multiple channels in the systolic array circuit.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification