-
公开(公告)号:WO2020190812A1
公开(公告)日:2020-09-24
申请号:PCT/US2020/022850
申请日:2020-03-14
Applicant: INTEL CORPORATION
Inventor: RANGANATHAN, Vasanth , APPU, Abhishek R. , ASHBAUGH, Ben , DOYLE, Peter , FLIFLET, Brandon , HUNTER, Arthur , INSKO, Brent , JANUS, Scott , KOKER, Altug , NAVALE, Aditya , RAY, Joydeep , SINHA, Kamal , STRIRAMASSARMA, Lakshminarayanan , SURTI, Prasoonkumar , VALERIO, James
IPC: G06F9/50
Abstract: Embodiments are generally directed to compute optimization in graphics processing. An embodiment of an apparatus includes one or more processors including a multi-tile graphics processing unit (GPU) to process data, the multi-tile GPU including multiple processor tiles; and a memory for storage of data for processing, wherein the apparatus is to receive compute work for processing by the GPU, partition the compute work into multiple work units, assign each of multiple work units to one of the processor tiles, and process the compute work using the processor tiles assigned to the work units.
-
公开(公告)号:WO2020190429A1
公开(公告)日:2020-09-24
申请号:PCT/US2020/017897
申请日:2020-02-12
Applicant: INTEL CORPORATION , VEMULAPALLI, Vikranth , STRIRAMASSARMA, Lakshminarayanan , MACPHERSON, Mike , ANANTARAMAN, Aravindh , ASHBAUGH, Ben , RAMADOSS, Murali , SADLER, William B. , PEARCE, Jonathan , JANUS, Scott , INSKO, Brent , RANGANATHAN, Vasanth , SINHA, Kamal , HUNTER, Arthur , SURTI, Prasoonkumar , GALOPPO VON BORRIES, Nicolas , RAY, Joydeep , APPU, Abhisek R. , OULD-AHMED-VALL, ElMoustapha , KOKER, Altug , KIM, Sungye , MAIYURAN, Subramaniam , ANDREI, Valentin
Inventor: VEMULAPALLI, Vikranth , STRIRAMASSARMA, Lakshminarayanan , MACPHERSON, Mike , ANANTARAMAN, Aravindh , ASHBAUGH, Ben , RAMADOSS, Murali , SADLER, William B. , PEARCE, Jonathan , JANUS, Scott , INSKO, Brent , RANGANATHAN, Vasanth , SINHA, Kamal , HUNTER, Arthur , SURTI, Prasoonkumar , GALOPPO VON BORRIES, Nicolas , RAY, Joydeep , APPU, Abhisek R. , OULD-AHMED-VALL, ElMoustapha , KOKER, Altug , KIM, Sungye , MAIYURAN, Subramaniam , ANDREI, Valentin
IPC: G06F12/0862 , G06F12/0897 , G06F12/0888 , G06F9/38
Abstract: Embodiments are generally directed to data prefetching for graphics data processing. An embodiment of an apparatus includes one or more processors including one or more graphics processing units (GPUs); and a plurality of caches to provide storage for the one or more GPUs, the plurality of caches including at least an L1 cache and an L3 cache, wherein the apparatus to provide intelligent prefetching of data by a prefetcher of a first GPU of the one or more GPUs including measuring a hit rate for the L1 cache; upon determining that the hit rate for the L1 cache is equal to or greater than a threshold value, limiting a prefetch of data to storage in the L3 cache, and upon determining that the hit rate for the L1 cache is less than a threshold value, allowing the prefetch of data to the L1 cache.
-
公开(公告)号:WO2020190422A1
公开(公告)日:2020-09-24
申请号:PCT/US2020/017521
申请日:2020-02-10
Applicant: INTEL CORPORATION , RAMADOSS, Murali , VEMULAPALLI, Vikranth , COORAY, Niran , SADLER, William B. , PEARCE, Jonathan D. , PETRE, Marian Alin , ASHBAUGH, Ben , OULD-AHMED-VALL, ElMoustapha , GALOPPO VON BORRIES, Nicolas , KOKER, Altug , ANANTARAMAN, Aravindh , MAIYURAN, Subramaniam , GEORGE, Varghese , KIM, Sungye , VALENTIN, Andrei
Inventor: RAMADOSS, Murali , VEMULAPALLI, Vikranth , COORAY, Niran , SADLER, William B. , PEARCE, Jonathan D. , PETRE, Marian Alin , ASHBAUGH, Ben , OULD-AHMED-VALL, ElMoustapha , GALOPPO VON BORRIES, Nicolas , KOKER, Altug , ANANTARAMAN, Aravindh , MAIYURAN, Subramaniam , GEORGE, Varghese , KIM, Sungye , VALENTIN, Andrei
Abstract: Methods and apparatus relating to predictive page fault handling. In an example, an apparatus comprises a processor to receive a virtual address that triggered a page fault for a compute process, check a virtual memory space for a virtual memory allocation for the compute process that triggered the page fault and manage the page fault according to one of a first protocol in response to a determination that the virtual address that triggered the page fault is a last page in the virtual memory allocation for the compute process, or a second protocol in response to a determination that the virtual address that triggered the page fault is not a last page in the virtual memory allocation for the compute process. Other embodiments are also disclosed and claimed.
-
公开(公告)号:WO2020190801A1
公开(公告)日:2020-09-24
申请号:PCT/US2020/022839
申请日:2020-03-14
Applicant: INTEL CORPORATION
Inventor: RAY, Joydeep , PANNEER, Selvakumar , TANGRI, Saurabh , ASHBAUGH, Ben , JANUS, Scott , APPU, Abhishek , GEORGE, Varghese , IYER, Ravishankar , JAIN, Nilesh , K, Pattabhiraman , KOKER, Altug , MACPHERSON, Mike , MASTRONARDE, Josh , OULD-AHMED-VALL, Elmoustapha , S, Jayakrishna P. , SAMSON, Eric
IPC: G06F9/38 , G06F12/0862 , G06F9/30 , G06F12/06
Abstract: Embodiments described herein include, software, firmware, and hardware that provides techniques to enable deterministic scheduling across multiple general-purpose graphics processing units. One embodiment provides a multi-GPU architecture with uniform latency. One embodiment provides techniques to distribute memory output based on memory chip thermals. One embodiment provides techniques to enable thermally aware workload scheduling. One embodiment provides techniques to enable end to end contracts for workload scheduling on multiple GPUs.
-
公开(公告)号:WO2020190431A1
公开(公告)日:2020-09-24
申请号:PCT/US2020/017995
申请日:2020-02-12
Applicant: INTEL CORPORATION , ASHBAUGH, Ben , PEARCE, Jonathan , RAMADOSS, Murali , VEMULAPALLI, Vikranth , SADLER, William B. , KIM, Sungye , PETRE, Marian Alin
Inventor: ASHBAUGH, Ben , PEARCE, Jonathan , RAMADOSS, Murali , VEMULAPALLI, Vikranth , SADLER, William B. , KIM, Sungye , PETRE, Marian Alin
Abstract: Embodiments are generally directed to thread group scheduling for graphics processing. An embodiment of an apparatus includes a plurality of processors including a plurality of graphics processors to process data; a memory; and one or more caches for storage of data for the plurality of graphics processors, wherein the one or more processors are to schedule a plurality of groups of threads for processing by the plurality of graphics processors, the scheduling of the plurality of groups of threads including the plurality of processors to apply a bias for scheduling the plurality of groups of threads according to a cache locality for the one or more caches.
-
公开(公告)号:WO2020190425A1
公开(公告)日:2020-09-24
申请号:PCT/US2020/017743
申请日:2020-02-11
Applicant: INTEL CORPORATION , RAY, Joydeep , ANANTARAMAN, Aravindh , APPU, Abhishek R. , KOKER, Altug , OULD-AHMED-VALL, ElMoustapha , ANDREI, Valentin , MAIYURAN, Subramaniam , GALOPPO VON BORRIES, Nicolas , MACPHERSON, Mike , ASHBAUGH, Ben , RAMADOSS, Murali , VEMULAPALLI, Vikranth , SADLER, William , PEARCE, Jonathan , KIM, Sungye , GEORGE, Varghese
Inventor: RAY, Joydeep , ANANTARAMAN, Aravindh , APPU, Abhishek R. , KOKER, Altug , OULD-AHMED-VALL, ElMoustapha , ANDREI, Valentin , MAIYURAN, Subramaniam , GALOPPO VON BORRIES, Nicolas , MACPHERSON, Mike , ASHBAUGH, Ben , RAMADOSS, Murali , VEMULAPALLI, Vikranth , SADLER, William , PEARCE, Jonathan , KIM, Sungye , GEORGE, Varghese
Abstract: Methods and apparatus relating to scalar core integration in a graphics processor. In an example, an apparatus comprises a processor to receive a set of workload instructions for a graphics workload from a host complex, determine a first subset of operations in the set of operations that is suitable for execution by a scalar processor complex of the graphics processing device and a second subset of operations in the set of operations that is suitable for execution by a vector processor complex of the graphics processing device, assign the first subset of operations to the scalar processor complex for execution to generate a first set of outputs, assign the second subset of operations to the vector processor complex for execution to generate a second set of outputs. Other embodiments are also disclosed and claimed.
-
公开(公告)号:WO2020190810A1
公开(公告)日:2020-09-24
申请号:PCT/US2020/022848
申请日:2020-03-14
Applicant: INTEL CORPORATION
Inventor: KOKER, Altug , ASHBAUGH, Ben , JANUS, Scott , ANANTARAMAN, Aravindh , APPU, Abhishek R. , COORAY, Niran , GEORGE, Varghese , HUNTER, Arthur , INSKO, Brent , OULD-AHMED-VALL, ElMoustapha , PANNEER, Selvakumar , RANGANATHAN, Vasanth , RAY, Joydeep , SINHA, Kamal , STRIRAMASSARMA, Lakshminarayanan , SURTI, Prasoonkumar , TANGRI, Saurabh
IPC: G06F12/0804 , G06F12/0893 , G06F15/173
Abstract: Embodiments are generally directed to a multi-tile architecture for graphics operations. An embodiment of an apparatus includes a multi-tile architecture for graphics operations including a multi-tile graphics processor, the multi-tile processor includes one or more dies; multiple processor tiles installed on the one or more dies; and a structure to interconnect the processor tiles on the one or more dies, wherein the structure to enable communications between processor tiles the processor tiles.
-
公开(公告)号:WO2020190799A2
公开(公告)日:2020-09-24
申请号:PCT/US2020/022837
申请日:2020-03-14
Applicant: INTEL CORPORATION
Inventor: KOKER, Altug , RAY, Joydeep , ASHBAUGH, Ben , PEARCE, Jonathan , APPU, Abhishek , RANGANATHAN, Vasanth , STRIRAMASSARMA, Lakshminarayanan , OULD-AHMED-VALL, Elmoustapha , ANANTARAMAN, Aravindh , ANDREI, Valentin , GALOPPO VON BORRIES, Nicolas , GEORGE, Varghese , HAREL, Yoav , HUNTER, Arthur Jr. , INSKO, Brent , JANUS, Scott , K, Pattabhiraman , MACPHERSON, Mike , MAIYURAN, Subramaniam , PETRE, Marian Alin , RAMADOSS, Murali , SHAH, Shailesh , SINHA, Kamal , SURTI, Prasoonkumar , VEMULAPALLI, Vikranth
IPC: G06F9/38 , G06F12/0862 , G06F9/30 , G06F12/02 , G06F12/06 , G06F12/0804 , G06F12/0893 , G06F12/12 , G06F12/128 , G06F15/173 , G06F9/50
Abstract: Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache coupled to the processing resources. The cache controller is configured to control cache priority by determining whether default settings or an instruction will control cache operations for the cache.
-
公开(公告)号:WO2020190798A1
公开(公告)日:2020-09-24
申请号:PCT/US2020/022836
申请日:2020-03-14
Applicant: INTEL CORPORATION
Inventor: STRIRAMASSARMA, Lakshminarayanan , SURTI, Prasoonkumar , GEORGE, Varghese , ASHBAUGH, Ben , ANANTARAMAN, Aravindh , ANDREI, Valentin , APPU, Abhishek , GALOPPO VON BORRIES, Nicolas , KOKER, Altug , MACPHERSON, Mike , MAIYURAN, Subramaniam , MISTRY, Nilay , OULD-AHMED-VALL, Elmoustapha , PANNEER, Selvakumar , RANGANATHAN, Vasanth , RAY, Joydeep , SHAH, Ankur , TANGRI, Saurabh
IPC: G06F9/38 , G06F12/0862 , G06F9/30
Abstract: Multi-tile Memory Management for Detecting Cross Tile Access, Providing Multi-Tile Inference Scaling with multicasting of data via copy operation, and Providing Page Migration are disclosed herein. In one embodiment, a graphics processor for a multi-tile architecture includes a first graphics processing unit (GPU) having a memory and a memory controller, a second graphics processing unit (GPU) having a memory and a cross-GPU fabric to communicatively couple the first and second GPUs. The memory controller is configured to determine whether frequent cross tile memory accesses occur from the first GPU to the memory of the second GPU in the multi- GPU configuration and to send a message to initiate a data transfer mechanism when frequent cross tile memory accesses occur from the first GPU to the memory of the second GPU.
-
公开(公告)号:WO2020190426A1
公开(公告)日:2020-09-24
申请号:PCT/US2020/017747
申请日:2020-02-11
Applicant: INTEL CORPORATION , COORAY, Niran , SADLER, William B. , PEARCE, Jonathan D. , PETRE, Marian Alin , ASHBAUGH, Ben , RAMADOSS, Murali , VEMULAPALLI, Vikranth , SHAH, Ankur N. , SANKARAN, Rajesh
Inventor: COORAY, Niran , SADLER, William B. , PEARCE, Jonathan D. , PETRE, Marian Alin , ASHBAUGH, Ben , RAMADOSS, Murali , VEMULAPALLI, Vikranth , SHAH, Ankur N. , SANKARAN, Rajesh
Abstract: Methods and apparatus relating to transactional page fault handling. In an example, an apparatus comprises a processor to divide an execution thread of a graphics workload into a set of transactions which are to be executed atomically, initiate the execution of the thread, and manage the execution of the thread according to one of a first protocol in response to a determination that a page fault occurred in the execution of a transaction, or a second protocol in response to a determination that a page fault did not occur in the execution of a transaction. Other embodiments are also disclosed and claimed.
-
-
-
-
-
-
-
-
-