-
公开(公告)号:US20240028404A1
公开(公告)日:2024-01-25
申请号:US18328591
申请日:2023-06-02
Applicant: Intel Corporation
Inventor: Ben Ashbaugh , Jonathan Pearce , Murali Ramadoss , Vikranth Vemulapalli , William B. Sadler , Sungye Kim , Marian Alin Petre
CPC classification number: G06F9/5027 , G06F9/3851 , G06F9/3877 , G06F9/5033 , G06F9/545 , G06F12/0837 , G06F9/4881 , G06F9/5066 , G06F9/3885 , G06F9/3455 , G06F9/3887 , G06T1/60
Abstract: Embodiments are generally directed to thread group scheduling for graphics processing. An embodiment of an apparatus includes a plurality of processors including a plurality of graphics processors to process data; a memory; and one or more caches for storage of data for the plurality of graphics processors, wherein the one or more processors are to schedule a plurality of groups of threads for processing by the plurality of graphics processors, the scheduling of the plurality of groups of threads including the plurality of processors to apply a bias for scheduling the plurality of groups of threads according to a cache locality for the one or more caches.
-
公开(公告)号:US11762804B2
公开(公告)日:2023-09-19
申请号:US17868448
申请日:2022-07-19
Applicant: Intel Corporation
Inventor: Joydeep Ray , Aravindh Anantaraman , Abhishek R. Appu , Altug Koker , Elmoustapha Ould-Ahmed-Vall , Valentin Andrei , Subramaniam Maiyuran , Nicolas Galoppo Von Borries , Varghese George , Mike MacPherson , Ben Ashbaugh , Murali Ramadoss , Vikranth Vemulapalli , William Sadler , Jonathan Pearce , Sungye Kim
CPC classification number: G06F15/8069 , G06F9/30163 , G06F9/3877 , G06T15/005 , G06F9/3836
Abstract: Methods and apparatus relating to scalar core integration in a graphics processor. In an example, an apparatus comprises a processor to receive a set of workload instructions for a graphics workload from a host complex, determine a first subset of operations in the set of operations that is suitable for execution by a scalar processor complex of the graphics processing device and a second subset of operations in the set of operations that is suitable for execution by a vector processor complex of the graphics processing device, assign the first subset of operations to the scalar processor complex for execution to generate a first set of outputs, assign the second subset of operations to the vector processor complex for execution to generate a second set of outputs. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US20230051190A1
公开(公告)日:2023-02-16
申请号:US17865666
申请日:2022-07-15
Applicant: Intel Corporation
Inventor: Vikranth Vemulapalli , Lakshminarayanan Striramassarma , Mike MacPherson , Aravindh Anantaraman , Ben Ashbaugh , Murali Ramadoss , William B. Sadler , Jonathan Pearce , Scott Janus , Brent Insko , Vasanth Ranganathan , Kamal Sinha , Arthur Hunter, JR. , Prasoonkumar Surti , Nicolas Galoppo von Borries , Joydeep Ray , Abhishek R. Appu , ElMoustapha Ould-Ahmed-Vall , Altug Koker , Sungye Kim , Subramaniam Maiyuran , Valentin Andrei
IPC: G06F12/0862 , G06T1/20 , G06T1/60
Abstract: Embodiments are generally directed to data prefetching for graphics data processing. An embodiment of an apparatus includes one or more processors including one or more graphics processing units (GPUs); and a plurality of caches to provide storage for the one or more GPUs, the plurality of caches including at least an L1 cache and an L3 cache, wherein the apparatus to provide intelligent prefetching of data by a prefetcher of a first GPU of the one or more GPUs including measuring a hit rate for the L1 cache; upon determining that the hit rate for the L1 cache is equal to or greater than a threshold value, limiting a prefetch of data to storage in the L3 cache, and upon determining that the hit rate for the L1 cache is less than a threshold value, allowing the prefetch of data to the L1 cache.
-
公开(公告)号:US20220261347A1
公开(公告)日:2022-08-18
申请号:US17732308
申请日:2022-04-28
Applicant: Intel Corporation
Inventor: Altug Koker , Joydeep Ray , Ben Ashbaugh , Jonathan Pearce , Abhishek Appu , Vasanth Ranganathan , Lakshminarayanan Striramassarma , Elmoustapha Ould-Ahmed-Vall , Aravindh Anantaraman , Valentin Andrei , Nicolas Galoppo Von Borries , Varghese George , Yoav Harel , Arthur Hunter,, JR. , Brent Insko , Scott Janus , Pattabhiraman K , Mike Macpherson , Subramaniam Maiyuran , Marian Alin Petre , Murali Ramadoss , Shailesh Shah , Kamal Sinha , Prasoonkumar Surti , Vikranth Vemulapalli
IPC: G06F12/0802 , G06F9/30
Abstract: Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache coupled to the processing resources. The cache controller is configured to control cache priority by determining whether default settings or an instruction will control cache operations for the cache.
-
公开(公告)号:US11409693B2
公开(公告)日:2022-08-09
申请号:US17321885
申请日:2021-05-17
Applicant: Intel Corporation
Inventor: Joydeep Ray , Aravindh Anantaraman , Abhishek R. Appu , Altug Koker , Elmoustapha Ould-Ahmed-Vall , Valentin Andrei , Subramaniam Maiyuran , Nicolas Galoppo Von Borries , Varghese George , Mike MacPherson , Ben Ashbaugh , Murali Ramadoss , Vikranth Vemulapalli , William Sadler , Jonathan Pearce , Sungye Kim
Abstract: Methods and apparatus relating to scalar core integration in a graphics processor. In an example, an apparatus comprises a processor to receive a set of workload instructions for a graphics workload from a host complex, determine a first subset of operations in the set of operations that is suitable for execution by a scalar processor complex of the graphics processing device and a second subset of operations in the set of operations that is suitable for execution by a vector processor complex of the graphics processing device, assign the first subset of operations to the scalar processor complex for execution to generate a first set of outputs, assign the second subset of operations to the vector processor complex for execution to generate a second set of outputs. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US20220179787A1
公开(公告)日:2022-06-09
申请号:US17428530
申请日:2020-03-14
Applicant: Intel Corporation
Inventor: Altug Koker , Joydeep Ray , Ben Ashbaugh , Jonathan Pearce , Abhishek Appu , Vasanth Ranganathan , Lakshminarayanan Striramassarma , Elmoustapha Ould-Ahmed-Vall , Aravindh Anantaraman , Valentin Andrei , Nicolas Galoppo Von Borries , Varghese George , Yoav Harel , Arthur Hunter, Jr. , Brent Insko , Scott Janus , Pattabhiraman K , Mike Macpherson , Subramaniam Maiyuran , Marian Alin Petre , Murali Ramadoss , Shailesh Shah , Kamal Sinha , Prasoonkumar Surti , Vikranth Vemulapalli
IPC: G06F12/0802 , G06F9/30
Abstract: Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache coupled to the processing resources. The cache controller is configured to control cache priority by determining whether default settings or an instruction will control cache operations for the cache.
-
公开(公告)号:US11016929B2
公开(公告)日:2021-05-25
申请号:US16354782
申请日:2019-03-15
Applicant: Intel Corporation
Inventor: Joydeep Ray , Aravindh Anantaraman , Abhishek R. Appu , Altug Koker , Elmoustapha Ould-Ahmed-Vall , Valentin Andrei , Subramaniam Maiyuran , Nicolas Galappo Von Borries , Varghese George , Mike Macpherson , Ben Ashbaugh , Murali Ramadoss , Vikranth Vemulapalli , William Sadler , Jonathan Pearce , Sungye Kim
Abstract: Methods and apparatus relating to scalar core integration in a graphics processor. In an example, an apparatus comprises a processor to receive a set of workload instructions for a graphics workload from a host complex, determine a first subset of operations in the set of operations that is suitable for execution by a scalar processor complex of the graphics processing device and a second subset of operations in the set of operations that is suitable for execution by a vector processor complex of the graphics processing device, assign the first subset of operations to the scalar processor complex for execution to generate a first set of outputs, assign the second subset of operations to the vector processor complex for execution to generate a second set of outputs. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US10831505B2
公开(公告)日:2020-11-10
申请号:US16147692
申请日:2018-09-29
Applicant: Intel Corporation
Inventor: Jonathan Pearce , David Sheffield , Srikanth Srinivasan , Jeffrey Cook , Deborah Marr , Abhijit Davare , Andrey Ayupov
Abstract: An apparatus and method for data parallel single program multiple data (SPMD) execution. For example, one embodiment of a processor comprises: instruction fetch circuitry to fetch instructions of one or more primary threads; a decoder to decode the instructions to generate uops; a data parallel cluster (DPC) to execute microthreads comprising a subset of the uops, the DPC further comprising: a plurality of execution lanes to perform parallel execution of the microthreads; an instruction decode queue (IDQ) to store the uops prior to execution; and a scheduler to evaluate the microthreads based on associated variables including instruction pointer (IP) values, the scheduler to gang microthreads into fragments for parallel execution on the execution lanes based on the evaluation.
-
-
-
-
-
-
-