-
公开(公告)号:WO2022271226A1
公开(公告)日:2022-12-29
申请号:PCT/US2022/020408
申请日:2022-03-15
Applicant: INTEL CORPORATION
Inventor: PARRA, Jorge , FU, Fangwen , MAIYURAN, Subramaniam , GEORGE, Varghese , MACPHERSON, Mike , PAL, Supratim , GURRAM, Chandra , GANAPATHY, Sabareesh , AVANCHA, Sasikanth , VOOTURI, Dharma Teja , MELLEMPUDI, Naveen , DAS, Dipankar
IPC: G06F17/16 , G06F9/00 , G06F15/8046 , G06F7/523 , G06F7/5443 , G06F9/3001 , G06F9/30036
Abstract: A processing apparatus is described herein that includes a general-purpose parallel processing engine comprising a matrix accelerator including one or more systolic arrays, at least one of the one or more systolic arrays comprising multiple pipeline stages, each pipeline stage of the multiple pipeline stages including multiple processing elements, the multiple processing elements configured to perform processing operations on input matrix elements based on output sparsity metadata. The output sparsity metadata indicates to the multiple processing elements to bypass multiplication for a first row of elements of a second matrix and multiply a second row of elements of the second matrix with a column of matrix elements of a first matrix.
-
公开(公告)号:WO2020190799A2
公开(公告)日:2020-09-24
申请号:PCT/US2020/022837
申请日:2020-03-14
Applicant: INTEL CORPORATION
Inventor: KOKER, Altug , RAY, Joydeep , ASHBAUGH, Ben , PEARCE, Jonathan , APPU, Abhishek , RANGANATHAN, Vasanth , STRIRAMASSARMA, Lakshminarayanan , OULD-AHMED-VALL, Elmoustapha , ANANTARAMAN, Aravindh , ANDREI, Valentin , GALOPPO VON BORRIES, Nicolas , GEORGE, Varghese , HAREL, Yoav , HUNTER, Arthur Jr. , INSKO, Brent , JANUS, Scott , K, Pattabhiraman , MACPHERSON, Mike , MAIYURAN, Subramaniam , PETRE, Marian Alin , RAMADOSS, Murali , SHAH, Shailesh , SINHA, Kamal , SURTI, Prasoonkumar , VEMULAPALLI, Vikranth
IPC: G06F9/38 , G06F12/0862 , G06F9/30 , G06F12/02 , G06F12/06 , G06F12/0804 , G06F12/0893 , G06F12/12 , G06F12/128 , G06F15/173 , G06F9/50
Abstract: Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache coupled to the processing resources. The cache controller is configured to control cache priority by determining whether default settings or an instruction will control cache operations for the cache.
-
公开(公告)号:WO2020190798A1
公开(公告)日:2020-09-24
申请号:PCT/US2020/022836
申请日:2020-03-14
Applicant: INTEL CORPORATION
Inventor: STRIRAMASSARMA, Lakshminarayanan , SURTI, Prasoonkumar , GEORGE, Varghese , ASHBAUGH, Ben , ANANTARAMAN, Aravindh , ANDREI, Valentin , APPU, Abhishek , GALOPPO VON BORRIES, Nicolas , KOKER, Altug , MACPHERSON, Mike , MAIYURAN, Subramaniam , MISTRY, Nilay , OULD-AHMED-VALL, Elmoustapha , PANNEER, Selvakumar , RANGANATHAN, Vasanth , RAY, Joydeep , SHAH, Ankur , TANGRI, Saurabh
IPC: G06F9/38 , G06F12/0862 , G06F9/30
Abstract: Multi-tile Memory Management for Detecting Cross Tile Access, Providing Multi-Tile Inference Scaling with multicasting of data via copy operation, and Providing Page Migration are disclosed herein. In one embodiment, a graphics processor for a multi-tile architecture includes a first graphics processing unit (GPU) having a memory and a memory controller, a second graphics processing unit (GPU) having a memory and a cross-GPU fabric to communicatively couple the first and second GPUs. The memory controller is configured to determine whether frequent cross tile memory accesses occur from the first GPU to the memory of the second GPU in the multi- GPU configuration and to send a message to initiate a data transfer mechanism when frequent cross tile memory accesses occur from the first GPU to the memory of the second GPU.
-
公开(公告)号:WO2020190809A1
公开(公告)日:2020-09-24
申请号:PCT/US2020/022847
申请日:2020-03-14
Applicant: INTEL CORPORATION
Inventor: APPU, Abhishek , MAIYURAN, Subramaniam , MACPHERSON, Mike , FU, Fangwen , CHEN, Jiasheng , GEORGE, Varghese , RANGANATHAN, Vasanth , GARG, Ashutosh , RAY, Joydeep
Abstract: Embodiments described herein include, software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. One embodiment provides for data aware sparsity via compressed bitstreams. One embodiment provides for block sparse dot product instructions. One embodiment provides for a depth-wise adapter for a systolic array.
-
公开(公告)号:WO2020190806A1
公开(公告)日:2020-09-24
申请号:PCT/US2020/022844
申请日:2020-03-14
Applicant: INTEL CORPORATION
Inventor: KOKER, Altug , GEORGE, Varghese , ANANTARAMAN, Aravindh , ANDREI, Valentin , APPU, Abhishek R. , COORAY, Niran , GALOPPO VON BORRIES, Nicolas , MACPHERSON, Mike , MAIYURAN, Subramaniam , OULD-AHMED-VALL, ElMoustapha , PUFFER, David , RANGANATHAN, Vasanth , RAY, Joydeep , SHAH, Ankur N. , STRIRAMASSARMA, Lakshminarayanan , SURTI, Prasoonkumar , TANGRI, Saurabh
IPC: G06F9/38 , G06F12/0862 , G06F9/30
Abstract: Embodiments are generally directed to graphics processor data access and sharing. An embodiment of an apparatus includes a circuit element to produce a result in processing of an application; a load-store unit to receive the result and generate pre-fetch information for a cache utilizing the result; and a prefetch generator to produce prefetch addresses based at least in part on the pre-fetch information; wherein the load-store unit is to receive software assistance for prefetching, and wherein generation of the pre-fetch information is based at least in part on the software assistance.
-
公开(公告)号:WO2020190804A1
公开(公告)日:2020-09-24
申请号:PCT/US2020/022842
申请日:2020-03-14
Applicant: INTEL CORPORATION
Inventor: APPU, Abhishek R. , ANANTARAMAN, Aravindh , OULD-AHMED-VALL, ElMoustapha , ANDREI, Valentin , GALOPPO VON BORRIES, Nicolas , GEORGE, Varghese , KOKER, Altug , MACPHERSON, Mike , MAIYURAN, Subramaniam , RAY, Joydeep , RANGANATHAN, Vasanth
IPC: G06F3/06 , G06F12/0895 , G06F7/58
Abstract: Methods and apparatus relating to data initialization techniques. In an example, an apparatus comprises a processor to read one or more metadata codes which map to one or more cache lines in a cache memory and invoke a random number generator to generate random numerical data for the one or more cache lines in response to a determination that the one more metadata codes indicate that the cache lines are to contain random numerical data. Other embodiments are also disclosed and claimed.
-
公开(公告)号:WO2020190811A1
公开(公告)日:2020-09-24
申请号:PCT/US2020/022849
申请日:2020-03-14
Applicant: INTEL CORPORATION
Inventor: KOKER, Altug , STRIRAMASSARMA, Lakshminarayanan , ANANTARAMAN, Aravindh , ANDREI, Valentin , APPU, Abhishek R. , COLEMAN, Sean , GEORGE, Varghese , K, Pattabhiraman , MACPHERSON, Mike , MAIYURAN, Subramaniam , OULD-AHMED-VALL, ElMoustapha , RANGANATHAN, Vasanth , RAY, Joydeep , S, Jayakrishna P , SURTI, Prasoonkumar
IPC: G06F9/38 , G06F12/0862 , G06F9/30
Abstract: Embodiments are generally directed to cache structure and utilization. An embodiment of an apparatus includes one or more processors including a graphics processor; a memory for storage of data for processing by the one or more processors; and a cache to cache data from the memory; wherein the apparatus is to provide for dynamic overfetching of cache lines for the cache, including receiving a read request and accessing the cache for the requested data, and upon a miss in the cache, overfetching data from memory or a higher level cache in addition to fetching the requested data, wherein the overfetching of data is based at least in part on a current overfetch boundary, and provides for data is to be prefetched extending to the current overfetch boundary.
-
公开(公告)号:WO2020190807A1
公开(公告)日:2020-09-24
申请号:PCT/US2020/022845
申请日:2020-03-14
Applicant: INTEL CORPORATION
Inventor: SURTI, Prasoonkumar , MAIYURAN, Subramaniam , ANDREI, Valentin , APPU, Abhishek , GEORGE, Varghese , KOKER, Altug , MACPHERSON, Mike , OULD-AHMED-VALL, Elmoustapha , RANGANATHAN, Vasanth , RAY, Joydeep , STRIRAMASSARMA, Lakshminarayanan , KIM, SungYe
IPC: G06F9/30
Abstract: Embodiments described herein include, software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. One embodiment provides techniques to optimize training and inference on a systolic array when using sparse data. One embodiment provides techniques to use decompression information when performing sparse compute operations. One embodiment enables the disaggregation of special function compute arrays via a shared reg file. One embodiment enables packed data compress and expand operations on a GPGPU. One embodiment provides techniques to exploit block sparsity within the cache hierarchy of a GPGPU.
-
公开(公告)号:WO2020190797A1
公开(公告)日:2020-09-24
申请号:PCT/US2020/022835
申请日:2020-03-14
Applicant: INTEL CORPORATION
Inventor: KOKER, Altug , RAY, Joydeep , ANANTARAMAN, Aravindh , ANDREI, Valentin , APPU, Abhishek , COLEMAN, Sean , GALOPPO VON BORRIES, Nicolas , GEORGE, Varghese , K, Pattabhiraman , KIM, SungYe , MACPHERSON, Mike , MAIYURAN, Subramaniam , OULD-AHMED-VALL, Elmoustapha , RANGANATHAN, Vasanth , VALERIO, James
IPC: G06F12/0811 , G06F12/0875
Abstract: Systems and methods for updating remote memory side caches in a multi-GPU configuration are disclosed herein. A graphics processor for a multi-tile architecture includes a first graphics processing unit (GPU) (2810) having a first memory (2870-1), a first memory side cache memory (2880-1), a first communication fabric (2860-1), and a first memory management unit (MMU) (2855-1). The graphics processor includes a second GPU (2820) having a second memory (2870-2), a second memory side cache memory (2880-2), a second MMU (2855-2), and a second communication fabric (2860-2) that is communicatively coupled to the first communication fabric. The first MMU is configured to control memory requests for the first memory, to update content in the first memory, to update content in the first memory side cache memory, and to determine whether to update the content in the second memory side cache memory.
-
公开(公告)号:WO2020190429A1
公开(公告)日:2020-09-24
申请号:PCT/US2020/017897
申请日:2020-02-12
Applicant: INTEL CORPORATION , VEMULAPALLI, Vikranth , STRIRAMASSARMA, Lakshminarayanan , MACPHERSON, Mike , ANANTARAMAN, Aravindh , ASHBAUGH, Ben , RAMADOSS, Murali , SADLER, William B. , PEARCE, Jonathan , JANUS, Scott , INSKO, Brent , RANGANATHAN, Vasanth , SINHA, Kamal , HUNTER, Arthur , SURTI, Prasoonkumar , GALOPPO VON BORRIES, Nicolas , RAY, Joydeep , APPU, Abhisek R. , OULD-AHMED-VALL, ElMoustapha , KOKER, Altug , KIM, Sungye , MAIYURAN, Subramaniam , ANDREI, Valentin
Inventor: VEMULAPALLI, Vikranth , STRIRAMASSARMA, Lakshminarayanan , MACPHERSON, Mike , ANANTARAMAN, Aravindh , ASHBAUGH, Ben , RAMADOSS, Murali , SADLER, William B. , PEARCE, Jonathan , JANUS, Scott , INSKO, Brent , RANGANATHAN, Vasanth , SINHA, Kamal , HUNTER, Arthur , SURTI, Prasoonkumar , GALOPPO VON BORRIES, Nicolas , RAY, Joydeep , APPU, Abhisek R. , OULD-AHMED-VALL, ElMoustapha , KOKER, Altug , KIM, Sungye , MAIYURAN, Subramaniam , ANDREI, Valentin
IPC: G06F12/0862 , G06F12/0897 , G06F12/0888 , G06F9/38
Abstract: Embodiments are generally directed to data prefetching for graphics data processing. An embodiment of an apparatus includes one or more processors including one or more graphics processing units (GPUs); and a plurality of caches to provide storage for the one or more GPUs, the plurality of caches including at least an L1 cache and an L3 cache, wherein the apparatus to provide intelligent prefetching of data by a prefetcher of a first GPU of the one or more GPUs including measuring a hit rate for the L1 cache; upon determining that the hit rate for the L1 cache is equal to or greater than a threshold value, limiting a prefetch of data to storage in the L3 cache, and upon determining that the hit rate for the L1 cache is less than a threshold value, allowing the prefetch of data to the L1 cache.
-
-
-
-
-
-
-
-
-