-
31.
公开(公告)号:US20190324757A1
公开(公告)日:2019-10-24
申请号:US15957695
申请日:2018-04-19
Applicant: Intel Corporation
Inventor: James Valerio , Ben Ashbaugh , Pradeep Ramani , Rebecca David , Sabareesh Ganapathy , Hashem Hashemi
Abstract: Embodiments described herein provide techniques to maintain high temporal cache locality between independent threads having the same or similar memory access pattern. One embodiment provides a graphics processing unit comprising an instruction execution pipeline including hardware execution logic and a thread dispatcher to process a set of commands for execution and distribute multiple groups of hardware threads to the hardware execution logic to execute the set of commands. The thread dispatcher can be configured to concurrently distribute a first group of the multiple groups of hardware threads to the hardware execution logic and withhold distribution of additional hardware threads for the set of commands until after the first group completes execution.
-
公开(公告)号:US20190205163A1
公开(公告)日:2019-07-04
申请号:US15860708
申请日:2018-01-03
Applicant: Intel Corporation
Inventor: Kiran C. Veernapu , Kamlesh Pillai , James Valerio , Joydeep Ray , Abhishek Appu
CPC classification number: G06F9/4881 , G06F9/22 , G06F9/54 , G06T1/20 , G06T15/005
Abstract: A mechanism is described to facilitate microcontroller-based flexible thread scheduling launching in computing environments. An apparatus of embodiments, as described herein, includes facilitating a graphics processor hosting a microcontroller having a thread scheduling unit, and detection and observation logic to detect a scheduling algorithm associated with an application at the apparatus. The apparatus may further include reading and dispatching logic to facilitate the microcontroller to prepare a flexible dispatch routine based on the scheduling algorithm. The apparatus may further include scheduling and launching logic to facilitate the thread scheduling unit to dynamically schedule and launch threads based on the flexible dispatch routine, where the threads are hosted by the graphics processor.
-
公开(公告)号:US20250147762A1
公开(公告)日:2025-05-08
申请号:US18504407
申请日:2023-11-08
Applicant: Intel Corporation
Inventor: Vasanth Ranganathan , Gang Chen , Supratim Pal , Jorge Eduardo Parra Osorio , Arthur Hunter , Boris Kuznetsov , Deepak N K , Siva Kumar Seemakurthi , James Valerio , Shubham Dinesh Chavan , Abhishek Kumar Singh , Samir Pandya , Sandeep Tippannanavar Niranjan , Alan Curtis , Jain Philip , Maltesh Kulkarni , Fangwen Fu , John Wiegert , Brent Schwartz
Abstract: Described herein is a graphics processor having processing resources with configurable thread and register configurations. Program code can configure a number of registers and accumulators that will be used by hardware threads during execution of the program code by the graphics processor. Processing resources within the graphics processor can be configured to assign different numbers of registers and accumulators to hardware threads based on the configuration requested by program code to be executed by the processing resource.
-
公开(公告)号:US20250068423A1
公开(公告)日:2025-02-27
申请号:US18453861
申请日:2023-08-22
Applicant: Intel Corporation
Inventor: Jorge Eduardo Parra Osorio , Jiasheng Chen , Supratim Pal , Vasanth Ranganathan , Guei-Yuan Lueh , James Valerio , Pradeep Golconda , Brent Schwartz , Fangwen Fu , Sabareesh Ganapathy , Peter Caday , Wei-Yu Chen , Po-Yu Chen , Timothy Bauer , Maxim Kazakov , Stanley Gambarin , Samir Pandya
Abstract: Described herein is a graphics processor comprising first circuitry configured to execute a decoded instruction and second circuitry configured to second circuitry configured to decode an instruction into the decoded instruction. The second circuitry is configured to determine a number of registers within a register file that are available to a thread of the processing resource and decode the instruction based on that number of registers.
-
公开(公告)号:US20250037347A1
公开(公告)日:2025-01-30
申请号:US18358297
申请日:2023-07-25
Applicant: Intel Corporation
Inventor: Jiasheng Chen , Supratim Pal , Kevin Hurd , Jorge E. Parra Osorio , Christopher Spencer , Takashi Nakagawa , Guei-Yuan Lueh , Pradeep K. Golconda , James Valerio , Mukundan Swaminathan , Nicholas Murphy , Clifford Gibson , Li-An Tang , Fangwen Fu , Kaiyu Chen , Buqi Cheng
Abstract: Described herein is a graphics processor comprising an instruction cache and a plurality of processing elements coupled with the instruction cache. The plurality of processing elements include functional units configured to provide an integer pipeline to execute instructions to perform operations on integer data elements. The integer pipeline including a first multiplier and a second multiplier, the first multiplier and the second multiplier configured to execute operations for a single instruction.
-
公开(公告)号:US12164952B2
公开(公告)日:2024-12-10
申请号:US17358882
申请日:2021-06-25
Applicant: Intel Corporation
Inventor: Vasanth Ranganathan , James Valerio , Joydeep Ray , Abhishek R. Appu , Alan Curtis , Prathamesh Raghunath Shinde , Brandon Fliflet , Ben J. Ashbaugh , John Wiegert
Abstract: An apparatus to facilitate barrier state save and restore for preemption in a graphics environment is disclosed. The apparatus includes processing resources to execute a plurality of execution threads that are comprised in a thread group (TG) and mid-thread preemption barrier save and restore hardware circuitry to: initiate an exception handling routine in response to a mid-thread preemption event, the exception handling routine to cause a barrier signaling event to be issued; receive indication of a valid designated thread status for a thread of a thread group (TG) in response to the barrier signaling event; and in response to receiving the indication of the valid designated thread status for the thread of the TG, cause, by the thread of the TG having the valid designated thread status, a barrier save routine and a barrier restore routine to be initiated for named barriers of the TG.
-
公开(公告)号:US12099461B2
公开(公告)日:2024-09-24
申请号:US17431034
申请日:2020-03-14
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Altug Koker , Aravindh Anantaraman , Elmoustapha Ould-Ahmed-Vall , Valentin Andrei , Nicolas Galoppo Von Borries , Varghese George , Mike Macpherson , Subramaniam Maiyuran , Joydeep Ray , Lakshminarayanan Striramassarma , Scott Janus , Brent Insko , Vasanth Ranganathan , Kamal Sinha , Arthur Hunter , Prasoonkumar Surti , David Puffer , James Valerio , Ankur N. Shah
IPC: G06F16/00 , G06F7/544 , G06F7/575 , G06F7/58 , G06F9/30 , G06F9/38 , G06F9/50 , G06F12/02 , G06F12/06 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/78 , G06F15/80 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06N3/08 , G06T15/06
CPC classification number: G06F15/7839 , G06F7/5443 , G06F7/575 , G06F7/588 , G06F9/3001 , G06F9/30014 , G06F9/30036 , G06F9/3004 , G06F9/30043 , G06F9/30047 , G06F9/30065 , G06F9/30079 , G06F9/3887 , G06F9/5011 , G06F9/5077 , G06F12/0215 , G06F12/0238 , G06F12/0246 , G06F12/0607 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/8046 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06F9/3802 , G06F9/3818 , G06F9/3867 , G06F2212/1008 , G06F2212/1021 , G06F2212/1044 , G06F2212/302 , G06F2212/401 , G06F2212/455 , G06F2212/60 , G06N3/08 , G06T15/06
Abstract: Methods and apparatus relating to techniques for multi-tile memory management. In an example, an apparatus comprises a cache memory, a high-bandwidth memory, a shader core communicatively coupled to the cache memory and comprising a processing element to decompress a first data element extracted from an in-memory database in the cache memory and having a first bit length to generate a second data element having a second bit length, greater than the first bit length, and an arithmetic logic unit (ALU) to compare the data element to a target value provided in a query of the in-memory database. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US20240220254A1
公开(公告)日:2024-07-04
申请号:US18148997
申请日:2022-12-30
Applicant: Intel Corporation
Inventor: Chunhui Mei , Yongsheng Liu , John A. Wiegert , Vasanth Ranganathan , Ben J. Ashbaugh , Fangwen Fu , Hong Jiang , Guei-Yuan Lueh , James Valerio , Alan M. Curtis , Maxim Kazakov
CPC classification number: G06F9/30087 , G06F9/3877 , G06F9/5072 , G06F9/544
Abstract: Data multicast in compute core clusters is described. An example of an apparatus includes one or more processors including at least a first processor, the first processor including one or more clusters of cores and a memory, wherein each cluster of cores includes multiple cores, each core including one or more processing resources, shared memory, and broadcast circuitry; and wherein a first core in a first cluster of cores is to request a data element, determine whether any additional cores in the first cluster require the data element, and, upon determining that one or more additional cores in the first cluster require the data element, broadcast the data element to the one or more additional cores via interconnects between the broadcast circuitry of the cores of the first core cluster.
-
39.
公开(公告)号:US20230153176A1
公开(公告)日:2023-05-18
申请号:US17528386
申请日:2021-11-17
Applicant: Intel Corporation
Inventor: Chunhui Mei , James Valerio , Supratim Pal , Guei-Yuan Lueh , Hong Jiang
Abstract: An apparatus to facilitate facilitating forward progress guarantee using single-level synchronization at individual thread granularity is disclosed. The apparatus includes a processor comprising a barrier synchronization hardware circuitry to assign a set of global named barrier identifiers (IDs) to individual execution threads of a plurality of execution threads and synchronize execution of the individual execution threads on a single level via the set of global named barrier IDs; and a plurality of processing resources to execute the plurality of execution threads and comprising divergent barrier scheduling hardware circuitry to facilitate execution flow switching from a first divergent branch executed by a first thread to a second divergent branch executed by a second thread, the execution flow switching performed responsive to the first thread stalling to wait on a named barrier of the set of global named barrier IDs.
-
公开(公告)号:US20230090973A1
公开(公告)日:2023-03-23
申请号:US17480528
申请日:2021-09-21
Applicant: Intel Corporation
Inventor: Joydeep Ray , Abhishek R. Appu , Timothy R. Bauer , James Valerio , Weiyu Chen , Subramaniam Maiyuran , Prasoonkumar Surti , Karthik Vaidyanathan , Carsten Benthin , Sven Woop , Jiasheng Chen
Abstract: One embodiment provides a graphics processor including a processing resource including a register file, memory, a cache memory, and load/store/cache circuitry to process load, store, and prefetch messages from the processing resource. The circuitry includes support for an immediate address offset that will be used to adjust the address supplied for a memory access to be requested by the circuitry. Including support for the immediate address offset removes the need to execute additional instructions to adjust the address to be accessed prior to execution of the memory access instruction.
-
-
-
-
-
-
-
-
-