Patent search ap:("Intel Corporation") AND inv:"Wei-Yu Chen" Page 1

1.

发明申请
INSTRUCTION ENCODING TO IMPLEMENT INCREASED REGISTER CAPACITY PER THREAD 有权

公开(公告)号：US20250068423A1

公开(公告)日：2025-02-27

申请号：US18453861

申请日：2023-08-22

Applicant: Intel Corporation

Inventor： Jorge Eduardo Parra Osorio , Jiasheng Chen , Supratim Pal , Vasanth Ranganathan , Guei-Yuan Lueh , James Valerio , Pradeep Golconda , Brent Schwartz , Fangwen Fu , Sabareesh Ganapathy , Peter Caday , Wei-Yu Chen , Po-Yu Chen , Timothy Bauer , Maxim Kazakov , Stanley Gambarin , Samir Pandya

IPC: G06F9/30 , G06F9/38

Abstract: Described herein is a graphics processor comprising first circuitry configured to execute a decoded instruction and second circuitry configured to second circuitry configured to decode an instruction into the decoded instruction. The second circuitry is configured to determine a number of registers within a register file that are available to a thread of the processing resource and decode the instruction based on that number of registers.

2.

发明授权
Compiler assisted register file write reduction 有权

公开(公告)号：US11900502B2

公开(公告)日：2024-02-13

申请号：US17734983

申请日：2022-05-02

Applicant: Intel Corporation

Inventor： Chandra S. Gurram , Gang Y. Chen , Subramaniam Maiyuran , Supratim Pal , Ashutosh Garg , Jorge E. Parra , Darin M. Starkey , Guei-Yuan Lueh , Wei-Yu Chen

IPC: G06T1/20 , G06T1/60

CPC classification number: G06T1/20 , G06T1/60

Abstract: Examples described herein relate to a software and hardware optimization that manages scenarios where a write operation to a register is less than an entirety of the register. A compiler detects instructions that make partial writes to the same register, groups such instructions, and provides hints to hardware of the partial write. The execution unit combines the output data for grouped instructions and updates the destination register as single write instead of multiple separate partial writes.

3.

发明申请
PROVIDING NATIVE SUPPORT FOR GENERIC POINTERS IN A GRAPHICS PROCESSING UNIT 有权

公开(公告)号：US20230102538A1

公开(公告)日：2023-03-30

申请号：US17484066

申请日：2021-09-24

Applicant: Intel Corporation

Inventor： Joydeep Ray , Prathamesh Raghunath Shinde , Ben J. Ashbaugh , Wei-Yu Chen , Abhishek R. Appu , Vasanth Ranganathan , Dmitry Yurievich Babokin , Ankur N. Shah

IPC: G06T1/20 , G06T1/60 , G06F9/38 , G06F9/30

Abstract: Embodiments are directed to systems and methods for supporting generic pointers in hardware of a GPU. According to one embodiment, a GPU includes multiple sub-cores each having a processing resource and a load/store pipeline. The processing resource is operable to receive a memory access message including a pointer and a memory type identifier indicative of the pointer representing a generic pointer. The processing resource is further operable to output a load or store operation to the load/store pipeline based on the memory access message, including computing an address for the load or store operation by adding a base address of a named memory type of a plurality of named memory types referenced by the generic pointer to an offset into a memory of the named memory type. The load/store pipeline is operable to, responsive to receipt of the load or store operation, access the memory at the address.

4.

发明申请
FUSED INSTRUCTION TO ACCELERATE PERFORMANCE OF SECURE HASH ALGORITHM 2 (SHA-2) WORKLOADS IN A GRAPHICS ENVIRONMENT 有权

公开(公告)号：US20220416999A1

公开(公告)日：2022-12-29

申请号：US17358897

申请日：2021-06-25

Applicant: Intel Corporation

Inventor： Supratim Pal , Wajdi Feghali , Changwon Rhee , Wei-Yu Chen , Timothy R. Bauer , Alexander Lyashevsky

IPC: H04L9/06 , G06F9/38 , G06T15/00

Abstract: An apparatus to facilitate a fused instruction to accelerate performance of secure hash algorithm 2 (SHA-2) in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising execution circuitry to receive a fused SHA instruction identifying a length corresponding to a data size of the fused SHA instruction and a functional control identifying an operation type of the fused SHA instruction; based on decoding the fused SHA instruction, cause a sub-function identified by the length and the function control to be scheduled to an integer pipeline of the execution resource; and execute the sub-function of the fused SHA instruction in an integer pipeline of the execution circuitry, the sub-function to perform merged operations on a source operand of the fused SHA instruction, the merged operations comprising a rotate operation, a shift operation, and an xor operation.

5.

发明申请
MULTIPLE REGISTER ALLOCATION SIZES FOR THREADS 有权

公开(公告)号：US20220413916A1

公开(公告)日：2022-12-29

申请号：US17358650

申请日：2021-06-25

Applicant: Intel Corporation

Inventor： Chandra Gurram , Wei-Yu Chen , Vikranth Vemulapalli , Subramaniam Maiyuran , Jorge Eduardo Parra Osorio , Shuai Mu , Guei-Yuan Lueh , Supratim Pal

IPC: G06F9/50 , G06F9/48 , G06T1/20

Abstract: Provision of multiple register allocation sizes for threads is described. An example of a system includes one or more processors including a graphics processor, the graphics processor including at least a first local thread dispatcher (TDL) and multiple processing resources, each processing resource including a plurality of registers; and memory for storage of data for processing, wherein the one or more processors are to determine a register size for a first thread; identify one or more processing resources having sufficient register space for the first thread; select a processing resource of the one or more processing resources having sufficient register space to assign the first thread; select an available thread slot of the selected processing resource for the first thread; and allocate registers of the selected processing resource for the first thread.

6.

发明授权
Register spill/fill using shared local memory space 有权

公开(公告)号：US11508338B2

公开(公告)日：2022-11-22

申请号：US17062871

申请日：2020-10-05

Applicant: Intel Corporation

Inventor： Joydeep Ray , Altug Koker , Balaji Vembu , Murali Ramadoss , Guei-Yuan Lueh , James A. Valerio , Prasoonkumar Surti , Abhishek R. Appu , Vasanth Ranganathan , Kalyan K. Bhiravabhatla , Arthur D. Hunter, Jr. , Wei-Yu Chen , Subramaniam M. Maiyuran

IPC: G09G5/36 , G06F12/0875 , G06F9/46 , G09G5/00 , G06F12/084 , G06F12/0811

Abstract: A mechanism is described for facilitating using of a shared local memory for register spilling/filling relating to graphics processors at computing devices. A method of embodiments, as described herein, includes reserving one or more spaces of a shared local memory (SLM) to perform one or more of spilling and filling relating to registers associated with a graphics processor of a computing device.

7.

发明申请
INSTRUCTIONS AND LOGIC FOR VECTOR MULTIPLY ADD WITH ZERO SKIPPING 有权

公开(公告)号：US20220326953A1

公开(公告)日：2022-10-13

申请号：US17723312

申请日：2022-04-18

Applicant: Intel Corporation

Inventor： Supratim Pal , Sasikanth Avancha , Ishwar Bhati , Wei-Yu Chen , Dipankar Das , Ashutosh Garg , Chandra S. Gurram , Junjie Gu , Guei-Yuan Lueh , Subramaniam Maiyuran , Jorge E. Parra , Sudarshan Srinivasan , Varghese George

IPC: G06F9/38 , G06F9/30

Abstract: Embodiments described herein provide for an instruction and associated logic to enable a vector multiply add instructions with automatic zero skipping for sparse input. One embodiment provides for a general-purpose graphics processor comprising logic to perform operations comprising fetching a hardware macro instruction having a predicate mask, a repeat count, and a set of initial operands, where the initial operands include a destination operand and multiple source operands. The hardware macro instruction is configured to perform one or more multiply/add operations on input data associated with a set of matrices.

8.

发明授权
Compiler assisted register file write reduction 有权

公开(公告)号：US11321799B2

公开(公告)日：2022-05-03

申请号：US16726659

申请日：2019-12-24

Applicant: Intel Corporation

Inventor： Chandra S. Gurram , Gang Y. Chen , Subramaniam Maiyuran , Supratim Pal , Ashutosh Garg , Jorge E. Parra , Darin M. Starkey , Guei-Yuan Lueh , Wei-Yu Chen

IPC: G06T1/20 , G06T1/60

Abstract: Examples described herein relate to a software and hardware optimization that manages scenarios where a write operation to a register is less than an entirety of the register. A compiler detects instructions that make partial writes to the same register, groups such instructions, and provides hints to hardware of the partial write. The execution unit combines the output data for grouped instructions and updates the destination register as single write instead of multiple separate partial writes.

9.

发明申请
INSTRUCTION PREFETCH MECHANISM 有权

公开(公告)号：US20210279177A1

公开(公告)日：2021-09-09

申请号：US17210867

申请日：2021-03-24

Applicant: Intel Corporation

Inventor： Vasileios Porpodas , Guei-Yuan Lueh , Subramaniam Maiyuran , Wei-Yu Chen

IPC: G06F12/0862 , G06F12/0875 , G06F9/30 , G06F8/41

Abstract: An apparatus to facilitate data prefetching is disclosed. The apparatus includes a cache, one or more execution units (EUs) to execute program code, prefetch logic to maintain tracking information of memory instructions in the program code that trigger a cache miss and compiler logic to receive the tracking information, insert one or more pre-fetch instructions in updated program code to prefetch data from a memory for execution of one or more of the memory instructions that triggered a cache miss and download the updated program code for execution by the one or more EUs.

10.

发明申请
INSTRUCTIONS AND LOGIC FOR VECTOR MULTIPLY ADD WITH ZERO SKIPPING 有权

公开(公告)号：US20210191724A1

公开(公告)日：2021-06-24

申请号：US16724831

申请日：2019-12-23

Applicant: Intel Corporation

Inventor： Supratim Pal , Sasikanth Avancha , Ishwar Bhati , Wei-Yu Chen , Dipankar Das , Ashutosh Garg , Chandra S. Gurram , Junjie Gu , Guei-Yuan Lueh , Subramaniam Maiyuran , Jorge E. Parra , Sudarshan Srinivasan , Varghese George

IPC: G06F9/38 , G06F9/30

Abstract: Embodiments described herein provide for an instruction and associated logic to enable a vector multiply add instructions with automatic zero skipping for sparse input. One embodiment provides for a general-purpose graphics processor comprising logic to perform operations comprising fetching a hardware macro instruction having a predicate mask, a repeat count, and a set of initial operands, where the initial operands include a destination operand and multiple source operands. The hardware macro instruction is configured to perform one or more multiply/add operations on input data associated with a set of matrices.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification