Patent search ap:("INTEL CORPORATION") AND inv:"MACPHERSON Page Mike"

1.

发明申请
SYSTOLIC ARRAY HAVING SUPPORT FOR OUTPUT SPARSITY 审中-公开

公开(公告)号：WO2022271226A1

公开(公告)日：2022-12-29

申请号：PCT/US2022/020408

申请日：2022-03-15

Applicant: INTEL CORPORATION

Inventor： PARRA, Jorge , FU, Fangwen , MAIYURAN, Subramaniam , GEORGE, Varghese , MACPHERSON, Mike , PAL, Supratim , GURRAM, Chandra , GANAPATHY, Sabareesh , AVANCHA, Sasikanth , VOOTURI, Dharma Teja , MELLEMPUDI, Naveen , DAS, Dipankar

IPC: G06F17/16 , G06F9/00 , G06F15/8046 , G06F7/523 , G06F7/5443 , G06F9/3001 , G06F9/30036

Abstract: A processing apparatus is described herein that includes a general-purpose parallel processing engine comprising a matrix accelerator including one or more systolic arrays, at least one of the one or more systolic arrays comprising multiple pipeline stages, each pipeline stage of the multiple pipeline stages including multiple processing elements, the multiple processing elements configured to perform processing operations on input matrix elements based on output sparsity metadata. The output sparsity metadata indicates to the multiple processing elements to bypass multiplication for a first row of elements of a second matrix and multiply a second row of elements of the second matrix with a column of matrix elements of a first matrix.

2.

发明申请
SYSTEMS AND METHODS FOR IMPROVING CACHE EFFICIENCY AND UTILIZATION 审中-公开

公开(公告)号：WO2020190799A2

公开(公告)日：2020-09-24

申请号：PCT/US2020/022837

申请日：2020-03-14

Applicant: INTEL CORPORATION

Inventor： KOKER, Altug , RAY, Joydeep , ASHBAUGH, Ben , PEARCE, Jonathan , APPU, Abhishek , RANGANATHAN, Vasanth , STRIRAMASSARMA, Lakshminarayanan , OULD-AHMED-VALL, Elmoustapha , ANANTARAMAN, Aravindh , ANDREI, Valentin , GALOPPO VON BORRIES, Nicolas , GEORGE, Varghese , HAREL, Yoav , HUNTER, Arthur Jr. , INSKO, Brent , JANUS, Scott , K, Pattabhiraman , MACPHERSON, Mike , MAIYURAN, Subramaniam , PETRE, Marian Alin , RAMADOSS, Murali , SHAH, Shailesh , SINHA, Kamal , SURTI, Prasoonkumar , VEMULAPALLI, Vikranth

IPC: G06F9/38 , G06F12/0862 , G06F9/30 , G06F12/02 , G06F12/06 , G06F12/0804 , G06F12/0893 , G06F12/12 , G06F12/128 , G06F15/173 , G06F9/50

Abstract: Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache coupled to the processing resources. The cache controller is configured to control cache priority by determining whether default settings or an instruction will control cache operations for the cache.

3.

发明申请
MULTI-TILE MEMORY MANAGEMENT FOR DETECTING CROSS TILE ACCESS, PROVIDING MULTI-TILE INFERENCE SCALING, AND PROVIDING OPTIMAL PAGE MIGRATION 审中-公开

公开(公告)号：WO2020190798A1

公开(公告)日：2020-09-24

申请号：PCT/US2020/022836

申请日：2020-03-14

Applicant: INTEL CORPORATION

Inventor： STRIRAMASSARMA, Lakshminarayanan , SURTI, Prasoonkumar , GEORGE, Varghese , ASHBAUGH, Ben , ANANTARAMAN, Aravindh , ANDREI, Valentin , APPU, Abhishek , GALOPPO VON BORRIES, Nicolas , KOKER, Altug , MACPHERSON, Mike , MAIYURAN, Subramaniam , MISTRY, Nilay , OULD-AHMED-VALL, Elmoustapha , PANNEER, Selvakumar , RANGANATHAN, Vasanth , RAY, Joydeep , SHAH, Ankur , TANGRI, Saurabh

IPC: G06F9/38 , G06F12/0862 , G06F9/30

Abstract: Multi-tile Memory Management for Detecting Cross Tile Access, Providing Multi-Tile Inference Scaling with multicasting of data via copy operation, and Providing Page Migration are disclosed herein. In one embodiment, a graphics processor for a multi-tile architecture includes a first graphics processing unit (GPU) having a memory and a memory controller, a second graphics processing unit (GPU) having a memory and a cross-GPU fabric to communicatively couple the first and second GPUs. The memory controller is configured to determine whether frequent cross tile memory accesses occur from the first GPU to the memory of the second GPU in the multi- GPU configuration and to send a message to initiate a data transfer mechanism when frequent cross tile memory accesses occur from the first GPU to the memory of the second GPU.

4.

发明申请
ARCHITECTURE FOR BLOCK SPARSE OPERATIONS ON A SYSTOLIC ARRAY 审中-公开

公开(公告)号：WO2020190809A1

公开(公告)日：2020-09-24

申请号：PCT/US2020/022847

申请日：2020-03-14

Applicant: INTEL CORPORATION

Inventor： APPU, Abhishek , MAIYURAN, Subramaniam , MACPHERSON, Mike , FU, Fangwen , CHEN, Jiasheng , GEORGE, Varghese , RANGANATHAN, Vasanth , GARG, Ashutosh , RAY, Joydeep

IPC: G06F9/30 , G06F9/38

Abstract: Embodiments described herein include, software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. One embodiment provides for data aware sparsity via compressed bitstreams. One embodiment provides for block sparse dot product instructions. One embodiment provides for a depth-wise adapter for a systolic array.

5.

发明申请
GRAPHICS PROCESSOR DATA ACCESS AND SHARING 审中-公开

公开(公告)号：WO2020190806A1

公开(公告)日：2020-09-24

申请号：PCT/US2020/022844

申请日：2020-03-14

Applicant: INTEL CORPORATION

Inventor： KOKER, Altug , GEORGE, Varghese , ANANTARAMAN, Aravindh , ANDREI, Valentin , APPU, Abhishek R. , COORAY, Niran , GALOPPO VON BORRIES, Nicolas , MACPHERSON, Mike , MAIYURAN, Subramaniam , OULD-AHMED-VALL, ElMoustapha , PUFFER, David , RANGANATHAN, Vasanth , RAY, Joydeep , SHAH, Ankur N. , STRIRAMASSARMA, Lakshminarayanan , SURTI, Prasoonkumar , TANGRI, Saurabh

IPC: G06F9/38 , G06F12/0862 , G06F9/30

Abstract: Embodiments are generally directed to graphics processor data access and sharing. An embodiment of an apparatus includes a circuit element to produce a result in processing of an application; a load-store unit to receive the result and generate pre-fetch information for a cache utilizing the result; and a prefetch generator to produce prefetch addresses based at least in part on the pre-fetch information; wherein the load-store unit is to receive software assistance for prefetching, and wherein generation of the pre-fetch information is based at least in part on the software assistance.

6.

发明申请
DATA INITIALIZATION TECHNIQUES 审中-公开

公开(公告)号：WO2020190804A1

公开(公告)日：2020-09-24

申请号：PCT/US2020/022842

申请日：2020-03-14

Applicant: INTEL CORPORATION

Inventor： APPU, Abhishek R. , ANANTARAMAN, Aravindh , OULD-AHMED-VALL, ElMoustapha , ANDREI, Valentin , GALOPPO VON BORRIES, Nicolas , GEORGE, Varghese , KOKER, Altug , MACPHERSON, Mike , MAIYURAN, Subramaniam , RAY, Joydeep , RANGANATHAN, Vasanth

IPC: G06F3/06 , G06F12/0895 , G06F7/58

Abstract: Methods and apparatus relating to data initialization techniques. In an example, an apparatus comprises a processor to read one or more metadata codes which map to one or more cache lines in a cache memory and invoke a random number generator to generate random numerical data for the one or more cache lines in response to a determination that the one more metadata codes indicate that the cache lines are to contain random numerical data. Other embodiments are also disclosed and claimed.

7.

发明申请
CACHE STRUCTURE AND UTILIZATION 审中-公开

公开(公告)号：WO2020190811A1

公开(公告)日：2020-09-24

申请号：PCT/US2020/022849

申请日：2020-03-14

Applicant: INTEL CORPORATION

Inventor： KOKER, Altug , STRIRAMASSARMA, Lakshminarayanan , ANANTARAMAN, Aravindh , ANDREI, Valentin , APPU, Abhishek R. , COLEMAN, Sean , GEORGE, Varghese , K, Pattabhiraman , MACPHERSON, Mike , MAIYURAN, Subramaniam , OULD-AHMED-VALL, ElMoustapha , RANGANATHAN, Vasanth , RAY, Joydeep , S, Jayakrishna P , SURTI, Prasoonkumar

IPC: G06F9/38 , G06F12/0862 , G06F9/30

Abstract: Embodiments are generally directed to cache structure and utilization. An embodiment of an apparatus includes one or more processors including a graphics processor; a memory for storage of data for processing by the one or more processors; and a cache to cache data from the memory; wherein the apparatus is to provide for dynamic overfetching of cache lines for the cache, including receiving a read request and accessing the cache for the requested data, and upon a miss in the cache, overfetching data from memory or a higher level cache in addition to fetching the requested data, wherein the overfetching of data is based at least in part on a current overfetch boundary, and provides for data is to be prefetched extending to the current overfetch boundary.

8.

发明申请
SYSTOLIC DISAGGREGATION WITHIN A MATRIX ACCELERATOR ARCHITECTURE 审中-公开

公开(公告)号：WO2020190807A1

公开(公告)日：2020-09-24

申请号：PCT/US2020/022845

申请日：2020-03-14

Applicant: INTEL CORPORATION

Inventor： SURTI, Prasoonkumar , MAIYURAN, Subramaniam , ANDREI, Valentin , APPU, Abhishek , GEORGE, Varghese , KOKER, Altug , MACPHERSON, Mike , OULD-AHMED-VALL, Elmoustapha , RANGANATHAN, Vasanth , RAY, Joydeep , STRIRAMASSARMA, Lakshminarayanan , KIM, SungYe

IPC: G06F9/30

Abstract: Embodiments described herein include, software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. One embodiment provides techniques to optimize training and inference on a systolic array when using sparse data. One embodiment provides techniques to use decompression information when performing sparse compute operations. One embodiment enables the disaggregation of special function compute arrays via a shared reg file. One embodiment enables packed data compress and expand operations on a GPGPU. One embodiment provides techniques to exploit block sparsity within the cache hierarchy of a GPGPU.

9.

发明申请
SYSTEMS AND METHODS FOR UPDATING MEMORY SIDE CACHES IN A MULTI-GPU CONFIGURATION 审中-公开

公开(公告)号：WO2020190797A1

公开(公告)日：2020-09-24

申请号：PCT/US2020/022835

申请日：2020-03-14

Applicant: INTEL CORPORATION

Inventor： KOKER, Altug , RAY, Joydeep , ANANTARAMAN, Aravindh , ANDREI, Valentin , APPU, Abhishek , COLEMAN, Sean , GALOPPO VON BORRIES, Nicolas , GEORGE, Varghese , K, Pattabhiraman , KIM, SungYe , MACPHERSON, Mike , MAIYURAN, Subramaniam , OULD-AHMED-VALL, Elmoustapha , RANGANATHAN, Vasanth , VALERIO, James

IPC: G06F12/0811 , G06F12/0875

Abstract: Systems and methods for updating remote memory side caches in a multi-GPU configuration are disclosed herein. A graphics processor for a multi-tile architecture includes a first graphics processing unit (GPU) (2810) having a first memory (2870-1), a first memory side cache memory (2880-1), a first communication fabric (2860-1), and a first memory management unit (MMU) (2855-1). The graphics processor includes a second GPU (2820) having a second memory (2870-2), a second memory side cache memory (2880-2), a second MMU (2855-2), and a second communication fabric (2860-2) that is communicatively coupled to the first communication fabric. The first MMU is configured to control memory requests for the first memory, to update content in the first memory, to update content in the first memory side cache memory, and to determine whether to update the content in the second memory side cache memory.

10.

发明申请
DATA PREFETCHING FOR GRAPHICS DATA PROCESSING 审中-公开

公开(公告)号：WO2020190429A1

公开(公告)日：2020-09-24

申请号：PCT/US2020/017897

申请日：2020-02-12

Applicant: INTEL CORPORATION , VEMULAPALLI, Vikranth , STRIRAMASSARMA, Lakshminarayanan , MACPHERSON, Mike , ANANTARAMAN, Aravindh , ASHBAUGH, Ben , RAMADOSS, Murali , SADLER, William B. , PEARCE, Jonathan , JANUS, Scott , INSKO, Brent , RANGANATHAN, Vasanth , SINHA, Kamal , HUNTER, Arthur , SURTI, Prasoonkumar , GALOPPO VON BORRIES, Nicolas , RAY, Joydeep , APPU, Abhisek R. , OULD-AHMED-VALL, ElMoustapha , KOKER, Altug , KIM, Sungye , MAIYURAN, Subramaniam , ANDREI, Valentin

Inventor： VEMULAPALLI, Vikranth , STRIRAMASSARMA, Lakshminarayanan , MACPHERSON, Mike , ANANTARAMAN, Aravindh , ASHBAUGH, Ben , RAMADOSS, Murali , SADLER, William B. , PEARCE, Jonathan , JANUS, Scott , INSKO, Brent , RANGANATHAN, Vasanth , SINHA, Kamal , HUNTER, Arthur , SURTI, Prasoonkumar , GALOPPO VON BORRIES, Nicolas , RAY, Joydeep , APPU, Abhisek R. , OULD-AHMED-VALL, ElMoustapha , KOKER, Altug , KIM, Sungye , MAIYURAN, Subramaniam , ANDREI, Valentin

IPC: G06F12/0862 , G06F12/0897 , G06F12/0888 , G06F9/38

Abstract: Embodiments are generally directed to data prefetching for graphics data processing. An embodiment of an apparatus includes one or more processors including one or more graphics processing units (GPUs); and a plurality of caches to provide storage for the one or more GPUs, the plurality of caches including at least an L1 cache and an L3 cache, wherein the apparatus to provide intelligent prefetching of data by a prefetcher of a first GPU of the one or more GPUs including measuring a hit rate for the L1 cache; upon determining that the hit rate for the L1 cache is equal to or greater than a threshold value, limiting a prefetch of data to storage in the L3 cache, and upon determining that the hit rate for the L1 cache is less than a threshold value, allowing the prefetch of data to the L1 cache.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification