-
公开(公告)号:US20220261347A1
公开(公告)日:2022-08-18
申请号:US17732308
申请日:2022-04-28
Applicant: Intel Corporation
Inventor: Altug Koker , Joydeep Ray , Ben Ashbaugh , Jonathan Pearce , Abhishek Appu , Vasanth Ranganathan , Lakshminarayanan Striramassarma , Elmoustapha Ould-Ahmed-Vall , Aravindh Anantaraman , Valentin Andrei , Nicolas Galoppo Von Borries , Varghese George , Yoav Harel , Arthur Hunter,, JR. , Brent Insko , Scott Janus , Pattabhiraman K , Mike Macpherson , Subramaniam Maiyuran , Marian Alin Petre , Murali Ramadoss , Shailesh Shah , Kamal Sinha , Prasoonkumar Surti , Vikranth Vemulapalli
IPC: G06F12/0802 , G06F9/30
Abstract: Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache coupled to the processing resources. The cache controller is configured to control cache priority by determining whether default settings or an instruction will control cache operations for the cache.
-
公开(公告)号:US11409693B2
公开(公告)日:2022-08-09
申请号:US17321885
申请日:2021-05-17
Applicant: Intel Corporation
Inventor: Joydeep Ray , Aravindh Anantaraman , Abhishek R. Appu , Altug Koker , Elmoustapha Ould-Ahmed-Vall , Valentin Andrei , Subramaniam Maiyuran , Nicolas Galoppo Von Borries , Varghese George , Mike MacPherson , Ben Ashbaugh , Murali Ramadoss , Vikranth Vemulapalli , William Sadler , Jonathan Pearce , Sungye Kim
Abstract: Methods and apparatus relating to scalar core integration in a graphics processor. In an example, an apparatus comprises a processor to receive a set of workload instructions for a graphics workload from a host complex, determine a first subset of operations in the set of operations that is suitable for execution by a scalar processor complex of the graphics processing device and a second subset of operations in the set of operations that is suitable for execution by a vector processor complex of the graphics processing device, assign the first subset of operations to the scalar processor complex for execution to generate a first set of outputs, assign the second subset of operations to the vector processor complex for execution to generate a second set of outputs. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US20220222767A1
公开(公告)日:2022-07-14
申请号:US17580352
申请日:2022-01-20
Applicant: Intel Corporation
Inventor: Joydeep Ray , Aravindh Anantaraman , Valentin Andrei , Abhishek R. Appu , Nicolas Galoppo von Borries , Varghese George , Altug Koker , Elmoustapha Ould-Ahmed-Vall , Mike Macpherson , Subramaniam Maiyuran
Abstract: Embodiments are generally directed to memory prefetching in multiple GPU environment. An embodiment of an apparatus includes multiple processors including a host processor and multiple graphics processing units (GPUs) to process data, each of the GPUs including a prefetcher and a cache; and a memory for storage of data, the memory including a plurality of memory elements, wherein the prefetcher of each of the GPUs is to prefetch data from the memory to the cache of the GPU; and wherein the prefetcher of a GPU is prohibited from prefetching from a page that is not owned by the GPU or by the host processor.
-
公开(公告)号:US20220180467A1
公开(公告)日:2022-06-09
申请号:US17428534
申请日:2020-03-14
Applicant: Intel Corporation
Inventor: Altug Koker , Joydeep Ray , Aravindh Anantaraman , Valentin Andrei , Abhishek Appu , Sean Coleman , Nicolas Galoppo Von Borries , Varghese George , Pattabhiraman K , SungYe Kim , Mike Macpherson , Subramaniam Maiyuran , Elmoustapha Ould-Ahmed-Vall , Vasanth Ranganathan , James Valerio
IPC: G06T1/20 , G06F12/0804 , G06F12/0811 , G06T1/60
Abstract: Systems and methods for updating remote memory side caches in a multi-GPU configuration are disclosed herein. In one embodiment, a graphics processor for a multi-tile architecture includes a first graphics processing unit (GPU) having a first memory, a first memory side cache memory, a first communication fabric, and a first memory management unit (MMU). The graphics processor includes a second graphics processing unit (GPU) having a second memory, a second memory side cache memory, a second memory management unit (MMU), and a second communication fabric that is communicatively coupled to the first communication fabric. The first MMU is configured to control memory requests for the first memory, to update content in the first memory, to update content in the first memory side cache memory, and to determine whether to update the content in the second memory side cache memory.
-
公开(公告)号:US20220179787A1
公开(公告)日:2022-06-09
申请号:US17428530
申请日:2020-03-14
Applicant: Intel Corporation
Inventor: Altug Koker , Joydeep Ray , Ben Ashbaugh , Jonathan Pearce , Abhishek Appu , Vasanth Ranganathan , Lakshminarayanan Striramassarma , Elmoustapha Ould-Ahmed-Vall , Aravindh Anantaraman , Valentin Andrei , Nicolas Galoppo Von Borries , Varghese George , Yoav Harel , Arthur Hunter, Jr. , Brent Insko , Scott Janus , Pattabhiraman K , Mike Macpherson , Subramaniam Maiyuran , Marian Alin Petre , Murali Ramadoss , Shailesh Shah , Kamal Sinha , Prasoonkumar Surti , Vikranth Vemulapalli
IPC: G06F12/0802 , G06F9/30
Abstract: Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache coupled to the processing resources. The cache controller is configured to control cache priority by determining whether default settings or an instruction will control cache operations for the cache.
-
公开(公告)号:US20220164917A1
公开(公告)日:2022-05-26
申请号:US17544826
申请日:2021-12-07
Applicant: Intel Corporation
Inventor: Aravindh Anantaraman , Altug Koker , Varghese George , Subramaniam Maiyuran , SungYe Kim , Valentin Andrei
Abstract: Apparatuses including general-purpose graphics processing units and graphics multiprocessors that exploit queues or transitional buffers for improved low-latency high-bandwidth on-die data retrieval are disclosed. In one embodiment, a graphics multiprocessor includes at least one compute engine to provide a request, a queue or transitional buffer, and logic coupled to the queue or transitional buffer. The logic is configured to cause a request to be transferred to a queue or transitional buffer for temporary storage without processing the request and to determine whether the queue or transitional buffer has a predetermined amount of storage capacity.
-
公开(公告)号:US20220156202A1
公开(公告)日:2022-05-19
申请号:US17590362
申请日:2022-02-01
Applicant: Intel Corporation
Inventor: Altug Koker , Joydeep Ray , Elmoustapha Ould-Ahmed-Vall , Abhishek Appu , Aravindh Anantaraman , Valentin Andrei , Durgaprasad Bilagi , Varghese George , Brent Insko , Sanjeev Jahagirdar , Scott Janus , Pattabhiraman K , SungYe Kim , Subramaniam Maiyuran , Vasanth Ranganathan , Lakshminarayanan Striramassarma , Xinmin Tian
IPC: G06F12/123 , G06F12/0891 , G06T1/60 , G06F12/0875
Abstract: Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache memory that is coupled to the processing resources. The cache controller is configured to set an initial aging policy using an aging field based on age of cache lines within the cache memory and to determine whether a hint or an instruction to indicate a level of aging has been received. In one embodiment, the cache memory configured to be partitioned into multiple cache regions, wherein the multiple cache regions include a first cache region having a cache eviction policy with a configurable level of data persistence.
-
公开(公告)号:US20220138101A1
公开(公告)日:2022-05-05
申请号:US17430611
申请日:2020-03-14
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Aravindh Anantaraman , Elmoustapha Ould-Ahmed-Vall , Valentin Andrei , Nicolas Galoppo Von Borries , Varghese George , Altug Koker , Mike Macpherson , Subramaniam Maiyuran , Joydeep Ray , Lakshminarayana Pappu , Guadalupe Garcia
IPC: G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0875 , G06F12/02 , G06F9/30 , G06T15/06
Abstract: Methods and apparatus relating to memory controller techniques. In an example, an apparatus comprises a cache memory, a high-bandwidth memory, and a processor communicatively coupled to the cache memory and the high-bandwidth memory, the processor to manage data transfer between the cache memory and the high-bandwidth memory for memory access operations directed to the high-bandwidth memory. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US20220129521A1
公开(公告)日:2022-04-28
申请号:US17428233
申请日:2020-03-14
Applicant: INTEL CORPORATION
Inventor: Prasoonkumar Surti , Subramaniam Maiyuran , Valentin Andrei , Abhishek Appu , Varghese George , Altug Koker , Mike Macpherson , Elmoustapha Ould-Ahmed-Vall , Vasanth Ranganathan , Joydeep Ray , Lakshminarayanan Striramassarma , SungYe Kim
Abstract: Embodiments described herein include software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. One embodiment provides techniques to optimize training and inference on a systolic array when using sparse data. One embodiment provides techniques to use decompression information when performing sparse compute operations. One embodiment enables the disaggregation of special function compute arrays via a shared reg file. One embodiment enables packed data compress and expand operations on a GPGPU. One embodiment provides techniques to exploit block sparsity within the cache hierarchy of a GPGPU.
-
公开(公告)号:US11119820B2
公开(公告)日:2021-09-14
申请号:US16354957
申请日:2019-03-15
Applicant: Intel Corporation
Inventor: Valentin Andrei , Aravindh Anantaraman , Abhishek R. Appu , Nicolas C. Galoppo von Borries , Altug Koker , SungYe Kim , Elmoustapha Ould-Ahmed-Vall , Mike Macpherson , Subramaniam Maiyuran , Vasanth Ranganathan , Joydeep Ray , Varghese George
Abstract: One embodiment provides for a general-purpose graphics processing unit comprising a set of processing elements to execute one or more thread groups of a second kernel to be executed by the general-purpose graphics processor, an on-chip memory coupled to the set of processing elements, and a scheduler coupled with the set of processing elements, the scheduler to schedule the thread groups of the kernel to the set of processing elements, wherein the scheduler is to schedule a thread group of the second kernel to execute subsequent to a thread group of a first kernel, the thread group of the second kernel configured to access a region of the on-chip memory that contains data written by the thread group of the first kernel in response to a determination that the second kernel is dependent upon the first kernel.
-
-
-
-
-
-
-
-
-