-
公开(公告)号:US20200286201A1
公开(公告)日:2020-09-10
申请号:US16297129
申请日:2019-03-08
Applicant: Intel Corporation
Inventor: James Valerio , Vasanth Ranganathan , Joydeep Ray , Abhishek R. Appu , Ben J. Ashbaugh , Brandon Fliflet , Jeffery S. Boles , Srinivasan Embar Raghukrishnan , Rahul Kulkarni
Abstract: Embodiments described herein provide an apparatus comprising a processor to configure a plurality of contexts of a command engine to execute a graphics workload comprising a plurality of walkers, allocate, from a pool of execution units of a graphics processor, a subset of execution units to each walker in the plurality of walkers based at least in part on the predetermined number of walkers configured for the context, for each context in the plurality of contexts, dispatch one or more walkers of the plurality of walkers to the execution units, and upon dispatch of the one or more walkers of the plurality of walkers, write an opcode to a computer-readable memory indicating that the dispatch of the walker is complete, wherein the opcode comprises dependency data for the one or more walkers of the plurality of walkers. Other embodiments may be described and claimed.
-
公开(公告)号:US20250117356A1
公开(公告)日:2025-04-10
申请号:US18915492
申请日:2024-10-15
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Altug Koker , Aravindh Anantaraman , Elmoustapha Ould-Ahmed-Vall , Valentin Andrei , Nicolas Galoppo Von Borries , Varghese George , Mike Macpherson , Subramaniam Maiyuran , Joydeep Ray , Lakshminarayanan Striramassarma , Scott Janus , Brent Insko , Vasanth Ranganathan , Kamal Sinha , Arthur Hunter , Prasoonkumar Surti , David Puffer , James Valerio , Ankur N. Shah
IPC: G06F15/78 , G06F7/544 , G06F7/575 , G06F7/58 , G06F9/30 , G06F9/38 , G06F9/50 , G06F12/02 , G06F12/06 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/80 , G06F17/16 , G06F17/18 , G06N3/08 , G06T1/20 , G06T1/60 , G06T15/06 , H03M7/46
Abstract: Methods and apparatus relating to techniques for multi-tile memory management. In an example, a graphics processor includes an interposer, a first chiplet coupled with the interposer, the first chiplet including a graphics processing resource and an interconnect network coupled with the graphics processing resource, cache circuitry coupled with the graphics processing resource via the interconnect network, and a second chiplet coupled with the first chiplet via the interposer, the second chiplet including a memory-side cache and a memory controller coupled with the memory-side cache. The memory controller is configured to enable access to a high-bandwidth memory (HBM) device, the memory-side cache is configured to cache data associated with a memory access performed via the memory controller, and the cache circuitry is logically positioned between the graphics processing resource and a chiplet interface.
-
公开(公告)号:US12124852B2
公开(公告)日:2024-10-22
申请号:US18347964
申请日:2023-07-06
Applicant: Intel Corporation
Inventor: James Valerio , Vasanth Ranganathan , Joydeep Ray , Pradeep Ramani
CPC classification number: G06F9/3802 , G06F13/28 , G06T1/20
Abstract: A graphics processing device is provided that includes a set of compute units to execute a workload, a cache coupled with the set of compute units, and circuitry coupled with the cache and the set of compute units. The circuitry is configured to, in response to a cache miss for the read from a first cache, broadcast an event within the graphics processor device to identify data associated with the cache miss, receive the event at a second compute unit in the set of compute units, and prefetch the data identified by the event into a second cache that is local to the second compute unit before an attempt to read the instruction or data by the second thread.
-
公开(公告)号:US20240095038A1
公开(公告)日:2024-03-21
申请号:US17949904
申请日:2022-09-21
Applicant: Intel Corporation
Inventor: John Wiegert , Joydeep Ray , Timothy Bauer , James Valerio
CPC classification number: G06F15/7839 , G06F9/30043
Abstract: Embodiments described herein provide a technique to decompose 64-bit per-lane virtual addresses to access a plurality of data elements on behalf of a multi-lane parallel processing execution resource of a graphics or compute accelerator. The 64-bit per-lane addresses are decomposed into a base address and a plurality of per-lane offsets for transmission to memory access circuitry. The memory access circuitry then combines the base address and the per-lane offsets to reconstruct the per-lane addresses.
-
公开(公告)号:US11762662B2
公开(公告)日:2023-09-19
申请号:US17509726
申请日:2021-10-25
Applicant: Intel Corporation
Inventor: James Valerio , Vasanth Ranganathan , Joydeep Ray , Pradeep Ramani
CPC classification number: G06F9/3802 , G06F13/28 , G06T1/20
Abstract: A graphics processing device comprises a set of compute units to execute multiple threads of a workload, a cache coupled with the set of compute units, and a prefetcher to prefetch instructions associated with the workload. The prefetcher is configured to use a thread dispatch command that is used to dispatch threads to execute a kernel to prefetch instructions, parameters, and/or constants that will be used during execution of the kernel. Prefetch operations for the kernel can then occur concurrently with thread dispatch operations.
-
公开(公告)号:US20230094002A1
公开(公告)日:2023-03-30
申请号:US17484711
申请日:2021-09-24
Applicant: Intel Corporation
Inventor: Hema Chand Nalluri , Jeffery S. Boles , Joseph Koston , Ankur N. Shah , Vidhya Krishnan , Vasanth Ranganathan , Joydeep Ray , Aditya Navale , Murali Ramadoss , James Valerio
Abstract: Dynamic routing of texture-load in graphics processing is described. An example of an apparatus includes a graphics processor including a plurality of processing engines of a class of processing engines of the graphic processor; a set of queues for the plurality of processing engines; and a unified submit port for the plurality of processing engines, wherein the unified submit port is to notify a scheduler regarding availability of slots in the set of queues for receipt of workload contexts; and wherein, upon the unified submit port receiving a workload context for processing by the plurality of processing engines, the unified submit port is to detect an available processing engine of the plurality of processing engines and direct the received context to a slot of the set of queues for processing by the available processing engine.
-
公开(公告)号:US20220413899A1
公开(公告)日:2022-12-29
申请号:US17358882
申请日:2021-06-25
Applicant: Intel Corporation
Inventor: Vasanth Ranganathan , James Valerio , Joydeep Ray , Abhishek R. Appu , Alan Curtis , Prathamesh Raghunath Shinde , Brandon Fliflet , Ben J. Ashbaugh , John Wiegert
Abstract: An apparatus to facilitate barrier state save and restore for preemption in a graphics environment is disclosed. The apparatus includes processing resources to execute a plurality of execution threads that are comprised in a thread group (TG) and mid-thread preemption barrier save and restore hardware circuitry to: initiate an exception handling routine in response to a mid-thread preemption event, the exception handling routine to cause a barrier signaling event to be issued; receive indication of a valid designated thread status for a thread of a thread group (TG) in response to the barrier signaling event; and in response to receiving the indication of the valid designated thread status for the thread of the TG, cause, by the thread of the TG having the valid designated thread status, a barrier save routine and a barrier restore routine to be initiated for named barriers of the TG.
-
公开(公告)号:US11301384B2
公开(公告)日:2022-04-12
申请号:US17068754
申请日:2020-10-12
Applicant: Intel Corporation
Inventor: Joydeep Ray , James Valerio , Ben Ashbaugh , Lakshminarayanan Striramassarma
IPC: G06F12/0811 , G06F3/06 , G06F9/38 , G06F9/54
Abstract: Embodiments described herein provide a general purpose graphics processor comprising a plurality of tiles, each tile of the plurality of tiles comprising at least one execution unit, a local cache, and a cache control unit, and a high bandwidth memory communicatively coupled to the plurality of tiles, wherein the high bandwidth memory is shared between the plurality of tiles. The cache control unit is to implement a partial write management protocol to receive a partial write operation directed to a cache line in the local cache, the partial write operation comprising write data, write the data associated with the partial write operation to the local cache when the cache line is in a modified state, and forward the write data associated with the partial write operation to the high bandwidth memory when the partial write operation triggers a cache miss or when the cache line is in an exclusive state or a shared state. Other embodiments may be described and claimed.
-
公开(公告)号:US10997686B2
公开(公告)日:2021-05-04
申请号:US16243624
申请日:2019-01-09
Applicant: Intel Corporation
Inventor: Balaji Vembu , Brandon Fliflet , James Valerio , Michael Apodaca , Ben Ashbaugh , Hema Nalluri , Ankur Shah , Murali Ramadoss , David Puffer , Altug Koker , Aditya Navale , Abhishek R. Appu , Joydeep Ray , Travis Schluessler
Abstract: Embodiments described herein provide a graphics, media, and compute device having a tiled architecture composed of a number of tiles of smaller graphics devices. The work distribution infrastructure for such device enables the distribution of workloads across multiple tiles of the device. Work items can be submitted to any one or more of the multiple tiles, with workloads able to span multiple tiles. Additionally, upon completion of a work item, graphics, media, and/or compute engines within the device can readily acquire new work items for execution with minimal latency.
-
公开(公告)号:US10776897B1
公开(公告)日:2020-09-15
申请号:US16297129
申请日:2019-03-08
Applicant: Intel Corporation
Inventor: James Valerio , Vasanth Ranganathan , Joydeep Ray , Abhishek R. Appu , Ben J. Ashbaugh , Brandon Fliflet , Jeffery S. Boles , Srinivasan Embar Raghukrishnan , Rahul Kulkarni
Abstract: Embodiments described herein provide an apparatus comprising a processor to configure a plurality of contexts of a command engine to execute a graphics workload comprising a plurality of walkers, allocate, from a pool of execution units of a graphics processor, a subset of execution units to each walker in the plurality of walkers based at least in part on the predetermined number of walkers configured for the context, for each context in the plurality of contexts, dispatch one or more walkers of the plurality of walkers to the execution units, and upon dispatch of the one or more walkers of the plurality of walkers, write an opcode to a computer-readable memory indicating that the dispatch of the walker is complete, wherein the opcode comprises dependency data for the one or more walkers of the plurality of walkers. Other embodiments may be described and claimed.
-
-
-
-
-
-
-
-
-