-
公开(公告)号:US10963389B2
公开(公告)日:2021-03-30
申请号:US16787841
申请日:2020-02-11
Applicant: Intel Corporation
Inventor: Vasileios Porpodas , Guei-Yuan Lueh , Subramaniam Maiyuran , Wei-Yu Chen
IPC: G06F12/0862 , G06F12/0875 , G06F9/30 , G06F8/41
Abstract: An apparatus to facilitate data prefetching is disclosed. The apparatus includes a cache, one or more execution units (EUs) to execute program code, prefetch logic to maintain tracking information of memory instructions in the program code that trigger a cache miss and compiler logic to receive the tracking information, insert one or more pre-fetch instructions in updated program code to prefetch data from a memory for execution of one or more of the memory instructions that triggered a cache miss and download the updated program code for execution by the one or more EUs.
-
公开(公告)号:US10691430B2
公开(公告)日:2020-06-23
申请号:US16113650
申请日:2018-08-27
Applicant: Intel Corporation
Inventor: Wei Pan , Wei-Yu Chen , Guei-Yuan Lueh
Abstract: An apparatus to facilitate instruction scheduling is disclosed. The apparatus includes one or more processors to receive a block of instructions, divide the block of instructions into a plurality of sub-blocks based on a register pressure bounded by a predetermined threshold and instructions in each of the plurality of sub-blocks for processing.
-
公开(公告)号:US20190265973A1
公开(公告)日:2019-08-29
申请号:US15903283
申请日:2018-02-23
Applicant: Intel Corporation
Inventor: Subramaniam Maiyuran , Supratim Pal , Ashutosh Garg , Darin M. Starkey , Guei-Yuan Lueh , Jorge E. Parra , Shubh B. Shah , Wei-Yu Chen , Vikranth Vemulapalli , Narsim Krishna , Brent A. Schwartz , Chandra S. Gurram , Wei Pan , Ashwin J. Shivani
Abstract: Methods and apparatus relating to techniques for fusing SIMD processing units. In an example, an apparatus comprises logic, at least partially comprising hardware logic, to receive an instruction set for execution on at least two graphics processing execution units, determine whether the instruction set requires data dependent addressing, and select between a synchronized execution environment for the at least two graphics processing units and an unsynchronized execution environment for the at least two graphics processing units based at least in part on the determination whether the instruction set requires data dependent addressing. Other embodiments are also disclosed and claimed.
-
24.
公开(公告)号:US09766892B2
公开(公告)日:2017-09-19
申请号:US14581858
申请日:2014-12-23
Applicant: INTEL CORPORATION
Inventor: Wei-Yu Chen , Guei-Yuan Lueh , Subramaniam Maiyuran
CPC classification number: G06F9/30058 , G06F9/3005 , G06F9/30072 , G06F9/30145 , G06F9/3017 , G06F9/3824 , G06F9/3851 , G06F9/3887 , G06F9/46 , G06T1/20
Abstract: An apparatus and method for executing nested control flow instructions on a graphics processing unit (GPU). For example, one embodiment of a processor comprises: an execution unit having a plurality of channels to execute control flow instructions including fused control flow instructions comprising two or more consecutive control flow instructions fused into a single fused control flow instruction; and a branch unit to process the control flow instructions and to maintain a global counter indicating a nesting level of the control flow instructions, wherein to process a fused control flow instruction, the branch unit is to store a value N in a stack indicating a number of control flow instructions fused into the fused control flow instruction, the branch unit to subsequently read the value N from the stack upon execution of the fused control flow instruction and decrement the global counter by a value of N responsive to execution of the fused control flow instruction.
-
公开(公告)号:US20160162345A1
公开(公告)日:2016-06-09
申请号:US14877582
申请日:2015-10-07
Applicant: Intel Corporation
Inventor: Wei-Yu Chen , Guei-Yuan Lueh , Subramaniam Maiyuran
CPC classification number: G06F9/547 , G06F9/30061 , G06F9/449 , G06F9/455 , G06F15/8007
Abstract: Systems and methods of enabling virtual calls in a single instruction multiple data (SIMD) environment may involve detecting a virtual call of a function and using a single dispatch of the function to invoke the virtual call for two or more channels of the virtual call. In one example, it is determined that the two or more channels share a common target address and a single dispatch of the function is conducted with respect to the common target address. The process may be iterated for additional channels of the virtual call that share a common target address.
Abstract translation: 在单个指令多数据(SIMD)环境中启用虚拟呼叫的系统和方法可以涉及检测功能的虚拟呼叫,并且使用该功能的单个调度来调用虚拟呼叫的两个或多个信道的虚拟呼叫。 在一个示例中,确定两个或更多个信道共享公共目标地址,并且相对于公共目标地址进行该功能的单个调度。 可以对共享共同目标地址的虚拟呼叫的附加信道重复该过程。
-
公开(公告)号:US12299766B2
公开(公告)日:2025-05-13
申请号:US17484066
申请日:2021-09-24
Applicant: Intel Corporation
Inventor: Joydeep Ray , Prathamesh Raghunath Shinde , Ben J. Ashbaugh , Wei-Yu Chen , Abhishek R. Appu , Vasanth Ranganathan , Dmitry Yurievich Babokin , Ankur N. Shah
Abstract: Systems and methods for supporting generic pointers in hardware of a graphics processing unit (GPU) are provided. In various examples, a GPU includes multiple sub-cores each having a processing resource and a load/store pipeline. The processing resource is operable to receive a memory access message including a pointer and a memory type identifier indicative of the pointer representing a generic pointer. The processing resource is further operable to output a load or store operation to the load/store pipeline based on the memory access message, including computing an address for the load or store operation by adding a base address of a named memory type of a plurality of named memory types referenced by the generic pointer to an offset into a memory of the named memory type. The load/store pipeline is operable to, responsive to receipt of the load or store operation, access the memory at the address.
-
公开(公告)号:US12210905B2
公开(公告)日:2025-01-28
申请号:US17358650
申请日:2021-06-25
Applicant: Intel Corporation
Inventor: Chandra Gurram , Wei-Yu Chen , Vikranth Vemulapalli , Subramaniam Maiyuran , Jorge Eduardo Parra Osorio , Shuai Mu , Guei-Yuan Lueh , Supratim Pal
Abstract: Provision of multiple register allocation sizes for threads is described. An example of a system includes one or more processors including a graphics processor, the graphics processor including at least a first local thread dispatcher (TDL) and multiple processing resources, each processing resource including a plurality of registers; and memory for storage of data for processing, wherein the one or more processors are to determine a register size for a first thread; identify one or more processing resources having sufficient register space for the first thread; select a processing resource of the one or more processing resources having sufficient register space to assign the first thread; select an available thread slot of the selected processing resource for the first thread; and allocate registers of the selected processing resource for the first thread.
-
公开(公告)号:US20240086329A1
公开(公告)日:2024-03-14
申请号:US18470553
申请日:2023-09-20
Applicant: Intel Corporation
Inventor: Vasileios Porpodas , Guei-Yuan Lueh , Subramaniam Maiyuran , Wei-Yu Chen
IPC: G06F12/0862 , G06F8/41 , G06F9/30 , G06F12/0875
CPC classification number: G06F12/0862 , G06F8/41 , G06F8/4442 , G06F9/30047 , G06F12/0875 , G06F2201/885 , G06F2212/1016 , G06F2212/452 , G06F2212/502 , G06F2212/602 , G06F2212/6028
Abstract: An apparatus to facilitate data prefetching is disclosed. The apparatus includes a cache, one or more execution units (EUs) to execute program code, prefetch logic to maintain tracking information of memory instructions in the program code that trigger a cache miss and compiler logic to receive the tracking information, insert one or more pre-fetch instructions in updated program code to prefetch data from a memory for execution of one or more of the memory instructions that triggered a cache miss and download the updated program code for execution by the one or more EUs.
-
公开(公告)号:US11803476B2
公开(公告)日:2023-10-31
申请号:US17210867
申请日:2021-03-24
Applicant: Intel Corporation
Inventor: Vasileios Porpodas , Guei-Yuan Lueh , Subramaniam Maiyuran , Wei-Yu Chen
IPC: G06F12/0862 , G06F12/0875 , G06F9/30 , G06F8/41
CPC classification number: G06F12/0862 , G06F8/41 , G06F8/4442 , G06F9/30047 , G06F12/0875 , G06F2201/885 , G06F2212/1016 , G06F2212/452 , G06F2212/502 , G06F2212/602 , G06F2212/6028
Abstract: An apparatus to facilitate data prefetching is disclosed. The apparatus includes a cache, one or more execution units (EUs) to execute program code, prefetch logic to maintain tracking information of memory instructions in the program code that trigger a cache miss and compiler logic to receive the tracking information, insert one or more pre-fetch instructions in updated program code to prefetch data from a memory for execution of one or more of the memory instructions that triggered a cache miss and download the updated program code for execution by the one or more EUs.
-
30.
公开(公告)号:US11579878B2
公开(公告)日:2023-02-14
申请号:US16881920
申请日:2020-05-22
Applicant: Intel Corporation
Inventor: Pratik J. Ashar , Supratim Pal , Subramaniam Maiyuran , Wei-Yu Chen , Guei-Yuan Lueh
Abstract: An apparatus is disclosed. The apparatus includes one or more processors comprising register sharing circuitry to receive meta-information indicating a number of threads that are to be disabled and provide an indication that an associated thread is disabled, a plurality of General Purpose Register Files (GRFs), wherein one or more of the plurality of GRFs is associated with one of the plurality of threads and a plurality of multiplexers coupled to the one or more GRFs to receive the indication from the register sharing circuitry and disable thread access to an associated GRF based on an indication that a thread is to be disabled.
-
-
-
-
-
-
-
-
-