-
公开(公告)号:US20220180467A1
公开(公告)日:2022-06-09
申请号:US17428534
申请日:2020-03-14
Applicant: Intel Corporation
Inventor: Altug Koker , Joydeep Ray , Aravindh Anantaraman , Valentin Andrei , Abhishek Appu , Sean Coleman , Nicolas Galoppo Von Borries , Varghese George , Pattabhiraman K , SungYe Kim , Mike Macpherson , Subramaniam Maiyuran , Elmoustapha Ould-Ahmed-Vall , Vasanth Ranganathan , James Valerio
IPC: G06T1/20 , G06F12/0804 , G06F12/0811 , G06T1/60
Abstract: Systems and methods for updating remote memory side caches in a multi-GPU configuration are disclosed herein. In one embodiment, a graphics processor for a multi-tile architecture includes a first graphics processing unit (GPU) having a first memory, a first memory side cache memory, a first communication fabric, and a first memory management unit (MMU). The graphics processor includes a second graphics processing unit (GPU) having a second memory, a second memory side cache memory, a second memory management unit (MMU), and a second communication fabric that is communicatively coupled to the first communication fabric. The first MMU is configured to control memory requests for the first memory, to update content in the first memory, to update content in the first memory side cache memory, and to determine whether to update the content in the second memory side cache memory.
-
公开(公告)号:US11321262B2
公开(公告)日:2022-05-03
申请号:US17014023
申请日:2020-09-08
Applicant: Intel Corporation
Inventor: Hema Chand Nalluri , Ankur Shah , Joydeep Ray , Aditya Navale , Altug Koker , Murali Ramadoss , Niranjan L. Cooray , Jeffery S. Boles , Aravindh Anantaraman , David Puffer , James Valerio , Vasanth Ranganathan
IPC: G06F9/52 , G06F12/14 , G06F13/40 , G06F13/16 , G06F12/0888 , G06F12/0837 , G06F9/30
Abstract: An apparatus to facilitate memory barriers is disclosed. The apparatus comprises an interconnect, a device memory, a plurality of processing resources, coupled to the device memory, to execute a plurality of execution threads as memory data producers and memory data consumers to a device memory and a system memory and fence hardware to generate fence operations to enforce data ordering on memory operations issued to the device memory and a system memory coupled via the interconnect.
-
53.
公开(公告)号:US11194722B2
公开(公告)日:2021-12-07
申请号:US15922809
申请日:2018-03-15
Applicant: Intel Corporation
Inventor: Bharath Narasimha Swamy , Joydeep Ray , Rama Kishan Malladi , James Valerio , Abhishek Appu
IPC: G06F12/0842 , G06F12/0855
Abstract: Apparatus and method for improved cache utilization and efficiency on a many-core processor. An apparatus comprising: a plurality of execution units to generate cache access requests responsive to executing instructions; a pending request queue to store pending cache access requests generated by the execution units; pending queue management circuitry to compare a current cache access request with entries in the pending request queue to determine whether the current cache access request can be merged with an entry in the pending request queue and, if so, to merge the current cache access request with the entry.
-
公开(公告)号:US11175949B2
公开(公告)日:2021-11-16
申请号:US16506730
申请日:2019-07-09
Applicant: Intel Corporation
Inventor: Kiran C. Veernapu , Kamlesh Pillai , James Valerio , Joydeep Ray , Abhishek Appu
Abstract: A mechanism is described to facilitate microcontroller-based flexible thread scheduling launching in computing environments. An apparatus of embodiments, as described herein, includes facilitating a graphics processor hosting a microcontroller having a thread scheduling unit, and detection and observation logic to detect a scheduling algorithm associated with an application at the apparatus. The apparatus may further include reading and dispatching logic to facilitate the microcontroller to prepare a flexible dispatch routine based on the scheduling algorithm. The apparatus may further include scheduling and launching logic to facilitate the thread scheduling unit to dynamically schedule and launch threads based on the flexible dispatch routine, where the threads are hosted by the graphics processor.
-
公开(公告)号:US20210241418A1
公开(公告)日:2021-08-05
申请号:US17234039
申请日:2021-04-19
Applicant: Intel Corporation
Inventor: Balaji Vembu , Brandon Fliflet , James Valerio , Michael Apodaca , Ben Ashbaugh , Hema Nalluri , Ankur Shah , Murali Ramadoss , David Puffer , Altug Koker , Aditya Navale , Abhishek R. Appu , Joydeep Ray , Travis Schluessler
Abstract: Embodiments described herein provide a graphics, media, and compute device having a tiled architecture composed of a number of tiles of smaller graphics devices. The work distribution infrastructure for such device enables the distribution of workloads across multiple tiles of the device. Work items can be submitted to any one or more of the multiple tiles, with workloads able to span multiple tiles. Additionally, upon completion of a work item, graphics, media, and/or compute engines within the device can readily acquire new work items for execution with minimal latency.
-
公开(公告)号:US10802967B1
公开(公告)日:2020-10-13
申请号:US16457088
申请日:2019-06-28
Applicant: Intel Corporation
Inventor: Joydeep Ray , James Valerio , Ben Ashbaugh , Lakshminarayanan Striramassarma
IPC: G06F12/0815 , G06F12/0811 , G06F3/06 , G06F9/38 , G06F9/54
Abstract: Embodiments described herein provide a general purpose graphics processor comprising a plurality of tiles, each tile of the plurality of tiles comprising at least one execution unit, a local cache, and a cache control unit, and a high bandwidth memory communicatively coupled to the plurality of tiles, wherein the high bandwidth memory is shared between the plurality of tiles. The cache control unit is to implement a partial write management protocol to receive a partial write operation directed to a cache line in the local cache, the partial write operation comprising write data, write the data associated with the partial write operation to the local cache when the cache line is in a modified state, and forward the write data associated with the partial write operation to the high bandwidth memory when the partial write operation triggers a cache miss or when the cache line is in an exclusive state or a shared state. Other embodiments may be described and claimed.
-
公开(公告)号:US20200219223A1
公开(公告)日:2020-07-09
申请号:US16243624
申请日:2019-01-09
Applicant: Intel Corporation
Inventor: Balaji Vembu , Brandon Fliflet , James Valerio , Michael Apodaca , Ben Ashbaugh , Hema Nalluri , Ankur Shah , Murali Ramadoss , David Puffer , Altug Koker , Aditya Navale , Abhishek R. Appu , Joydeep Ray , Travis Schluessler
Abstract: Embodiments described herein provide a graphics, media, and compute device having a tiled architecture composed of a number of tiles of smaller graphics devices. The work distribution infrastructure for such device enables the distribution of workloads across multiple tiles of the device. Work items can be submitted to any one or more of the multiple tiles, with workloads able to span multiple tiles. Additionally, upon completion of a work item, graphics, media, and/or compute engines within the device can readily acquire new work items for execution with minimal latency.
-
-
-
-
-
-