-
公开(公告)号:US12124852B2
公开(公告)日:2024-10-22
申请号:US18347964
申请日:2023-07-06
Applicant: Intel Corporation
Inventor: James Valerio , Vasanth Ranganathan , Joydeep Ray , Pradeep Ramani
CPC classification number: G06F9/3802 , G06F13/28 , G06T1/20
Abstract: A graphics processing device is provided that includes a set of compute units to execute a workload, a cache coupled with the set of compute units, and circuitry coupled with the cache and the set of compute units. The circuitry is configured to, in response to a cache miss for the read from a first cache, broadcast an event within the graphics processor device to identify data associated with the cache miss, receive the event at a second compute unit in the set of compute units, and prefetch the data identified by the event into a second cache that is local to the second compute unit before an attempt to read the instruction or data by the second thread.
-
公开(公告)号:US12056059B2
公开(公告)日:2024-08-06
申请号:US17590362
申请日:2022-02-01
Applicant: Intel Corporation
Inventor: Altug Koker , Joydeep Ray , Elmoustapha Ould-Ahmed-Vall , Abhishek Appu , Aravindh Anantaraman , Valentin Andrei , Durgaprasad Bilagi , Varghese George , Brent Insko , Sanjeev Jahagirdar , Scott Janus , Pattabhiraman K , SungYe Kim , Subramaniam Maiyuran , Vasanth Ranganathan , Lakshminarayanan Striramassarma , Xinmin Tian
IPC: G06F12/00 , G06F12/0875 , G06F12/0891 , G06F12/123 , G06T1/60
CPC classification number: G06F12/123 , G06F12/0875 , G06F12/0891 , G06T1/60 , G06F2212/302
Abstract: Systems and methods for cache utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache memory that is coupled to the processing resources. The cache controller is configured to set an initial aging policy using an aging field based on age of cache lines within the cache memory and to determine whether a hint or an instruction to indicate a level of aging has been received. In one embodiment, the cache memory configured to be partitioned into multiple cache regions, wherein the multiple cache regions include a first cache region having a cache eviction policy with a configurable level of data persistence.
-
公开(公告)号:US20240256483A1
公开(公告)日:2024-08-01
申请号:US18415052
申请日:2024-01-17
Applicant: Intel Corporation
Inventor: Altug Koker , Varghese George , Aravindh Anantaraman , Valentin Andrei , Abhishek R. Appu , Niranjan Cooray , Nicolas Galoppo Von Borries , Mike MacPherson , Subramaniam Maiyuran , ElMoustapha Ould-Ahmed-Vall , David Puffer , Vasanth Ranganathan , Joydeep Ray , Ankur N. Shah , Lakshminarayanan Striramassarma , Prasoonkumar Surti , Saurabh Tangri
IPC: G06F15/78 , G06F7/544 , G06F7/575 , G06F7/58 , G06F9/30 , G06F9/38 , G06F9/50 , G06F12/02 , G06F12/06 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/80 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06N3/08 , G06T15/06
CPC classification number: G06F15/7839 , G06F7/5443 , G06F7/575 , G06F7/588 , G06F9/3001 , G06F9/30014 , G06F9/30036 , G06F9/3004 , G06F9/30043 , G06F9/30047 , G06F9/30065 , G06F9/30079 , G06F9/3887 , G06F9/5011 , G06F9/5077 , G06F12/0215 , G06F12/0238 , G06F12/0246 , G06F12/0607 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/8046 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06F9/3802 , G06F9/3818 , G06F9/3867 , G06F2212/1008 , G06F2212/1021 , G06F2212/1044 , G06F2212/302 , G06F2212/401 , G06F2212/455 , G06F2212/60 , G06N3/08 , G06T15/06
Abstract: Embodiments are generally directed to graphics processor data access and sharing. An embodiment of an apparatus includes a circuit element to produce a result in processing of an application; a load-store unit to receive the result and generate pre-fetch information for a cache utilizing the result; and a prefetch generator to produce prefetch addresses based at least in part on the pre-fetch information; wherein the load-store unit is to receive software assistance for prefetching, and wherein generation of the pre-fetch information is based at least in part on the software assistance.
-
公开(公告)号:US12045658B2
公开(公告)日:2024-07-23
申请号:US17589689
申请日:2022-01-31
Applicant: Intel Corporation
Inventor: Pawel Majewski , Prasoonkumar Surti , Karthik Vaidyanathan , Joshua Barczak , Vasanth Ranganathan , Vikranth Vemulapalli
CPC classification number: G06F9/5027 , G06F9/4881 , G06F9/54
Abstract: Apparatus and method for stack access throttling for synchronous ray tracing. For example, one embodiment of an apparatus comprises: ray tracing acceleration hardware to manage active ray tracing stack allocations to ensure that a size of the active ray tracing stack allocations remains within a threshold; and an execution unit to execute a thread to explicitly request a new ray tracing stack allocation from the ray tracing acceleration hardware, the ray tracing acceleration hardware to permit the new ray tracing stack allocation if the size of the active ray tracing stack allocations will remain within the threshold after permitting the new ray tracing stack allocation.
-
公开(公告)号:US12032496B2
公开(公告)日:2024-07-09
申请号:US18358550
申请日:2023-07-25
Applicant: Intel Corporation
Inventor: Joydeep Ray , Altug Koker , Elmoustapha Ould-Ahmed-Vall , Michael Macpherson , Aravindh V. Anantaraman , Vasanth Ranganathan , Lakshminarayanan Striramassarma , Varghese George , Abhishek Appu , Prasoonkumar Surti
CPC classification number: G06F13/1605 , G06F9/3004 , G06F9/3887 , G06F9/5016 , G06T1/20 , G06T1/60
Abstract: An apparatus to facilitate efficient data sharing for graphics data processing operations is disclosed. The apparatus includes a processing resource to generate a stream of instructions, an L1 cache communicably coupled to the processing resource and comprising an on-page detector circuit to determine that a set of memory requests in the stream of instructions access a same memory page; and set a marker in a first request of the set of memory requests; and arbitration circuitry communicably coupled to the L1 cache, the arbitration circuitry to route the set of memory requests to memory comprising the memory page and to, in response to receiving the first request with the marker set, remain with the processing resource to process the set of memory requests.
-
公开(公告)号:US12001209B2
公开(公告)日:2024-06-04
申请号:US17750917
申请日:2022-05-23
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Altug Koker , Joydeep Ray , Balaji Vembu , John C. Weast , Mike B. Macpherson , Dukhwan Kim , Linda L. Hurd , Sanjeev Jahagirdar , Vasanth Ranganathan
IPC: G06F9/48 , G05D1/00 , G06F9/52 , G06N3/04 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/08 , G06N3/084 , G06F9/46 , G06T1/20
CPC classification number: G05D1/0088 , G06F9/4881 , G06F9/522 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/084 , G06F9/46 , G06T1/20
Abstract: A method of embodiments, as described herein, includes detecting thread groups relating to machine learning associated with one or more processing devices. The method may further include facilitating barrier synchronization of the thread groups across multiple dies such that each thread in a thread group is scheduled across a set of compute elements associated with the multiple dies, where each die represents a processing device of the one or more processing devices, the processing device including a graphics processor.
-
公开(公告)号:US11995029B2
公开(公告)日:2024-05-28
申请号:US17428527
申请日:2020-03-14
Applicant: Intel Corporation
Inventor: Lakshminarayanan Striramassarma , Prasoonkumar Surti , Varghese George , Ben Ashbaugh , Aravindh Anantaraman , Valentin Andrei , Abhishek Appu , Nicolas Galoppo Von Borries , Altug Koker , Mike Macpherson , Subramaniam Maiyuran , Nilay Mistry , Elmoustapha Ould-Ahmed-Vall , Selvakumar Panneer , Vasanth Ranganathan , Joydeep Ray , Ankur Shah , Saurabh Tangri
IPC: G06F12/00 , G06F7/544 , G06F7/575 , G06F7/58 , G06F9/30 , G06F9/38 , G06F9/50 , G06F12/02 , G06F12/06 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/78 , G06F15/80 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06N3/08 , G06T15/06
CPC classification number: G06F15/7839 , G06F7/5443 , G06F7/575 , G06F7/588 , G06F9/3001 , G06F9/30014 , G06F9/30036 , G06F9/3004 , G06F9/30043 , G06F9/30047 , G06F9/30065 , G06F9/30079 , G06F9/3887 , G06F9/5011 , G06F9/5077 , G06F12/0215 , G06F12/0238 , G06F12/0246 , G06F12/0607 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/8046 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06F9/3802 , G06F9/3818 , G06F9/3867 , G06F2212/1008 , G06F2212/1021 , G06F2212/1044 , G06F2212/302 , G06F2212/401 , G06F2212/455 , G06F2212/60 , G06N3/08 , G06T15/06
Abstract: Multi-tile Memory Management for Detecting Cross Tile Access, Providing Multi-Tile Inference Scaling with multicasting of data via copy operation, and Providing Page Migration are disclosed herein. In one embodiment, a graphics processor for a multi-tile architecture includes a first graphics processing unit (GPU) having a memory and a memory controller, a second graphics processing unit (GPU) having a memory and a cross-GPU fabric to communicatively couple the first and second GPUs. The memory controller is configured to determine whether frequent cross tile memory accesses occur from the first GPU to the memory of the second GPU in the multi-GPU configuration and to send a message to initiate a data transfer mechanism when frequent cross tile memory accesses occur from the first GPU to the memory of the second GPU.
-
公开(公告)号:US11934342B2
公开(公告)日:2024-03-19
申请号:US17429277
申请日:2020-03-14
Applicant: Intel Corporation
Inventor: Altug Koker , Varghese George , Aravindh Anantaraman , Valentin Andrei , Abhishek R. Appu , Niranjan Cooray , Nicolas Galoppo Von Borries , Mike MacPherson , Subramaniam Maiyuran , ElMoustapha Ould-Ahmed-Vall , David Puffer , Vasanth Ranganathan , Joydeep Ray , Ankur N. Shah , Lakshminarayanan Striramassarma , Prasoonkumar Surti , Saurabh Tangri
IPC: G06F15/78 , G06F7/544 , G06F7/575 , G06F7/58 , G06F9/30 , G06F9/38 , G06F9/50 , G06F12/02 , G06F12/06 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/80 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06N3/08 , G06T15/06
CPC classification number: G06F15/7839 , G06F7/5443 , G06F7/575 , G06F7/588 , G06F9/3001 , G06F9/30014 , G06F9/30036 , G06F9/3004 , G06F9/30043 , G06F9/30047 , G06F9/30065 , G06F9/30079 , G06F9/3887 , G06F9/5011 , G06F9/5077 , G06F12/0215 , G06F12/0238 , G06F12/0246 , G06F12/0607 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/8046 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06F9/3802 , G06F9/3818 , G06F9/3867 , G06F2212/1008 , G06F2212/1021 , G06F2212/1044 , G06F2212/302 , G06F2212/401 , G06F2212/455 , G06F2212/60 , G06N3/08 , G06T15/06
Abstract: Embodiments are generally directed to graphics processor data access and sharing. An embodiment of an apparatus includes a circuit element to produce a result in processing of an application; a load-store unit to receive the result and generate pre-fetch information for a cache utilizing the result; and a prefetch generator to produce prefetch addresses based at least in part on the pre-fetch information; wherein the load-store unit is to receive software assistance for prefetching, and wherein generation of the pre-fetch information is based at least in part on the software assistance.
-
公开(公告)号:US11762662B2
公开(公告)日:2023-09-19
申请号:US17509726
申请日:2021-10-25
Applicant: Intel Corporation
Inventor: James Valerio , Vasanth Ranganathan , Joydeep Ray , Pradeep Ramani
CPC classification number: G06F9/3802 , G06F13/28 , G06T1/20
Abstract: A graphics processing device comprises a set of compute units to execute multiple threads of a workload, a cache coupled with the set of compute units, and a prefetcher to prefetch instructions associated with the workload. The prefetcher is configured to use a thread dispatch command that is used to dispatch threads to execute a kernel to prefetch instructions, parameters, and/or constants that will be used during execution of the kernel. Prefetch operations for the kernel can then occur concurrently with thread dispatch operations.
-
公开(公告)号:US11755501B2
公开(公告)日:2023-09-12
申请号:US17212503
申请日:2021-03-25
Applicant: Intel Corporation
Inventor: Joydeep Ray , Altug Koker , Elmoustapha Ould-Ahmed-Vall , Michael Macpherson , Aravindh V. Anantaraman , Vasanth Ranganathan , Lakshminarayanan Striramassarma , Varghese George , Abhishek Appu , Prasoonkumar Surti
CPC classification number: G06F13/1605 , G06F9/3004 , G06F9/3887 , G06F9/5016 , G06T1/20 , G06T1/60
Abstract: An apparatus to facilitate efficient data sharing for graphics data processing operations is disclosed. The apparatus includes a processing resource to generate a stream of instructions, an L1 cache communicably coupled to the processing resource and comprising an on-page detector circuit to determine that a set of memory requests in the stream of instructions access a same memory page; and set a marker in a first request of the set of memory requests; and arbitration circuitry communicably coupled to the L1 cache, the arbitration circuitry to route the set of memory requests to memory comprising the memory page and to, in response to receiving the first request with the marker set, remain with the processing resource to process the set of memory requests.
-
-
-
-
-
-
-
-
-