Patent search ap:("INTEL CORPORATION") AND inv:"Jonathan Pearce" Page 2

11.

发明授权
Data prefetching for graphics data processing 有权

公开(公告)号：US11409658B2

公开(公告)日：2022-08-09

申请号：US17161465

申请日：2021-01-28

Applicant: Intel Corporation

Inventor： Vikranth Vemulapalli , Lakshminarayanan Striramassarma , Mike MacPherson , Aravindh Anantaraman , Ben Ashbaugh , Murali Ramadoss , William B. Sadler , Jonathan Pearce , Scott Janus , Brent Insko , Vasanth Ranganathan , Kamal Sinha , Arthur Hunter, Jr. , Prasoonkumar Surti , Nicolas Galoppo von Borries , Joydeep Ray , Abhishek R. Appu , ElMoustapha Ould-Ahmed-Vall , Altug Koker , Sungye Kim , Subramaniam Maiyuran , Valentin Andrei

IPC: G09G5/36 , G06F12/0862 , G06T1/20 , G06T1/60

Abstract: Embodiments are generally directed to data prefetching for graphics data processing. An embodiment of an apparatus includes one or more processors including one or more graphics processing units (GPUs); and a plurality of caches to provide storage for the one or more GPUs, the plurality of caches including at least an L1 cache and an L3 cache, wherein the apparatus to provide intelligent prefetching of data by a prefetcher of a first GPU of the one or more GPUs including measuring a hit rate for the L1 cache; upon determining that the hit rate for the L1 cache is equal to or greater than a threshold value, limiting a prefetch of data to storage in the L3 cache, and upon determining that the hit rate for the L1 cache is less than a threshold value, allowing the prefetch of data to the L1 cache.

12.

发明授权
Data prefetching for graphics data processing 有权

公开(公告)号：US10909039B2

公开(公告)日：2021-02-02

申请号：US16355015

申请日：2019-03-15

Applicant: Intel Corporation

Inventor： Vikranth Vemulapalli , Lakshminarayanan Striramassarma , Mike MacPherson , Aravindh Anantaraman , Ben Ashbaugh , Murali Ramadoss , William B. Sadler , Jonathan Pearce , Scott Janus , Brent Insko , Vasanth Ranganathan , Kamal Sinha , Arthur Hunter, Jr. , Prasoonkumar Surti , Nicolas Galoppo von Borries , Joydeep Ray , Abhishek R. Appu , ElMoustapha Ould-Ahmed-Vall , Altug Koker , Sungye Kim , Subramaniam Maiyuran , Valentin Andrei

IPC: G09G5/36 , G06F12/0862 , G06T1/20 , G06T1/60

Abstract: Embodiments are generally directed to data prefetching for graphics data processing. An embodiment of an apparatus includes one or more processors including one or more graphics processing units (GPUs); and a plurality of caches to provide storage for the one or more GPUs, the plurality of caches including at least an L1 cache and an L3 cache, wherein the apparatus to provide intelligent prefetching of data by a prefetcher of a first GPU of the one or more GPUs including measuring a hit rate for the L1 cache; upon determining that the hit rate for the L1 cache is equal to or greater than a threshold value, limiting a prefetch of data to storage in the L3 cache, and upon determining that the hit rate for the L1 cache is less than a threshold value, allowing the prefetch of data to the L1 cache.

13.

发明申请
DATA PREFETCHING FOR GRAPHICS DATA PROCESSING 审中-公开

公开(公告)号：US20200293450A1

公开(公告)日：2020-09-17

申请号：US16355015

申请日：2019-03-15

Applicant: Intel Corporation

Inventor： Vikranth Vemulapalli , Lakshminarayanan Striramassarma , Mike MacPherson , Aravindh Anantaraman , Ben Ashbaugh , Murali Ramadoss , William B. Sadler , Jonathan Pearce , Scott Janus , Brent Insko , Vasanth Ranganathan , Kamal Sinha , Arthur Hunter, JR. , Prasoonkumar Surti , Nicolas Galoppo von Borries , Joydeep Ray , Abhishek R. Appu , ElMoustapha Ould-Ahmed-Vall , Altug Koker , Sungye Kim , Subramaniam Maiyuran , Valentin Andrei

IPC: G06F12/0862 , G06T1/60 , G06T1/20

Abstract: Embodiments are generally directed to data prefetching for graphics data processing. An embodiment of an apparatus includes one or more processors including one or more graphics processing units (GPUs); and a plurality of caches to provide storage for the one or more GPUs, the plurality of caches including at least an L1 cache and an L3 cache, wherein the apparatus to provide intelligent prefetching of data by a prefetcher of a first GPU of the one or more GPUs including measuring a hit rate for the L1 cache; upon determining that the hit rate for the L1 cache is equal to or greater than a threshold value, limiting a prefetch of data to storage in the L3 cache, and upon determining that the hit rate for the L1 cache is less than a threshold value, allowing the prefetch of data to the L1 cache.

14.

发明申请
THREAD GROUP SCHEDULING FOR GRAPHICS PROCESSING 审中-公开

公开(公告)号：US20200293380A1

公开(公告)日：2020-09-17

申请号：US16355130

申请日：2019-03-15

Applicant: Intel Corporation

Inventor： Ben Ashbaugh , Jonathan Pearce , Murali Ramadoss , Vikranth Vemulapalli , William B. Sadler , Sungye Kim , Marian Alin Petre

IPC: G06F9/50 , G06F12/0837 , G06F9/38 , G06F9/54

Abstract: Embodiments are generally directed to thread group scheduling for graphics processing. An embodiment of an apparatus includes a plurality of processors including a plurality of graphics processors to process data; a memory; and one or more caches for storage of data for the plurality of graphics processors, wherein the one or more processors are to schedule a plurality of groups of threads for processing by the plurality of graphics processors, the scheduling of the plurality of groups of threads including the plurality of processors to apply a bias for scheduling the plurality of groups of threads according to a cache locality for the one or more caches.

15.

发明授权
Scalar core integration 有权

公开(公告)号：US12117962B2

公开(公告)日：2024-10-15

申请号：US18450685

申请日：2023-08-16

Applicant: Intel Corporation

Inventor： Joydeep Ray , Aravindh Anantaraman , Abhishek R. Appu , Altug Koker , Elmoustapha Ould-Ahmed-Vall , Valentin Andrei , Subramaniam Maiyuran , Nicolas Galoppo Von Borries , Varghese George , Mike Macpherson , Ben Ashbaugh , Murali Ramadoss , Vikranth Vemulapalli , William Sadler , Jonathan Pearce , Sungye Kim

IPC: G06T1/00 , G06F9/30 , G06F9/38 , G06F15/80 , G06T15/00

CPC classification number: G06F15/8069 , G06F9/30163 , G06F9/3877 , G06T15/005 , G06F9/3836

Abstract: Methods and apparatus relating to scalar core integration in a graphics processor. In an example, an apparatus comprises a processor to receive a set of workload instructions for a graphics workload from a host complex, determine a first subset of operations in the set of operations that is suitable for execution by a scalar processor complex of the graphics processing device and a second subset of operations in the set of operations that is suitable for execution by a vector processor complex of the graphics processing device, assign the first subset of operations to the scalar processor complex for execution to generate a first set of outputs, assign the second subset of operations to the vector processor complex for execution to generate a second set of outputs. Other embodiments are also disclosed and claimed.

16.

发明授权
Data prefetching for graphics data processing 有权

公开(公告)号：US11892950B2

公开(公告)日：2024-02-06

申请号：US17865666

申请日：2022-07-15

Applicant: Intel Corporation

Inventor： Vikranth Vemulapalli , Lakshminarayanan Striramassarma , Mike MacPherson , Aravindh Anantaraman , Ben Ashbaugh , Murali Ramadoss , William B. Sadler , Jonathan Pearce , Scott Janus , Brent Insko , Vasanth Ranganathan , Kamal Sinha , Arthur Hunter, Jr. , Prasoonkumar Surti , Nicolas Galoppo von Borries , Joydeep Ray , Abhishek R. Appu , ElMoustapha Ould-Ahmed-Vall , Altug Koker , Sungye Kim , Subramaniam Maiyuran , Valentin Andrei

IPC: G06F12/084 , G06F12/0862 , G06T1/20 , G06T1/60

CPC classification number: G06F12/0862 , G06T1/20 , G06T1/60 , G06F2212/602 , G06F2212/608

Abstract: Embodiments are generally directed to data prefetching for graphics data processing. An embodiment of an apparatus includes one or more processors including one or more graphics processing units (GPUs); and a plurality of caches to provide storage for the one or more GPUs, the plurality of caches including at least an L1 cache and an L3 cache, wherein the apparatus to provide intelligent prefetching of data by a prefetcher of a first GPU of the one or more GPUs including measuring a hit rate for the L1 cache; upon determining that the hit rate for the L1 cache is equal to or greater than a threshold value, limiting a prefetch of data to storage in the L3 cache, and upon determining that the hit rate for the L1 cache is less than a threshold value, allowing the prefetch of data to the L1 cache.

17.

发明授权
Apparatus and method for adaptable and efficient lane-wise tensor processing 有权

公开(公告)号：US11379229B2

公开(公告)日：2022-07-05

申请号：US16987838

申请日：2020-08-07

Applicant: INTEL CORPORATION

Inventor： Jonathan Pearce , David Sheffield , Srikanth Srinivasan , Jeffrey Cook , Debbie Marr , Abhijit Davare , Asit Mishra , Steven Burns , Desmond A. Kirkpatrick , Andrey Ayupov , Anton Alexandrovich Sorokin , Eriko Nurvitadhi

IPC: G06F9/30 , G06F9/38 , G06F17/16 , G06F7/57 , G06F12/0831 , G06F12/084

Abstract: An apparatus and method for performing efficient, adaptable tensor operations. For example, one embodiment of a processor comprises: front end circuitry to schedule matrix operations responsive to a matrix multiplication instruction; a plurality of lanes to perform parallel execution of the matrix operations, wherein a lane comprises an arithmetic logic unit to multiply a block of a first matrix with a block of a second matrix to generate a product and to accumulate the product with a block of a third matrix, and wherein the matrix blocks are to be stored in registers within the lane; and broadcast circuitry to broadcast one or more invariant matrix blocks to at least one of different registers within the lane and different registers across different lanes.

18.

发明授权
Thread group scheduling for graphics processing 有权

公开(公告)号：US11281496B2

公开(公告)日：2022-03-22

申请号：US16355130

申请日：2019-03-15

Applicant: Intel Corporation

Inventor： Ben Ashbaugh , Jonathan Pearce , Murali Ramadoss , Vikranth Vemulapalli , William B. Sadler , Sungye Kim , Marian Alin Petre

IPC: G06F9/46 , G06F9/50 , G06F9/38 , G06F9/54 , G06F12/0837 , G06F9/48 , G06F9/345 , G06T1/60 , G06F9/30 , G06T15/00 , G06F16/245 , G06T1/20

Abstract: Embodiments are generally directed to thread group scheduling for graphics processing. An embodiment of an apparatus includes a plurality of processors including a plurality of graphics processors to process data; a memory; and one or more caches for storage of data for the plurality of graphics processors, wherein the one or more processors are to schedule a plurality of groups of threads for processing by the plurality of graphics processors, the scheduling of the plurality of groups of threads including the plurality of processors to apply a bias for scheduling the plurality of groups of threads according to a cache locality for the one or more caches.

19.

发明授权
Apparatus and method for a high throughput parallel co-processor and interconnect with low offload latency 有权

公开(公告)号：US10915328B2

公开(公告)日：2021-02-09

申请号：US16220528

申请日：2018-12-14

Applicant: Intel Corporation

Inventor： Jonathan Pearce , David Sheffield , Srikanth Srinivasan , Jeffrey Cook , Deborah Marr

IPC: G06F9/38 , G06F9/30 , G06F9/28 , G06F13/40

Abstract: An apparatus and method for offloading iterative, parallel work to a data parallel cluster. For example, one embodiment of a processor comprises: a host processor to execute a primary thread; a data parallel cluster coupled to the host processor over a high speed interconnect, the data parallel cluster comprising a plurality of execution lanes to perform parallel execution of one or more secondary threads related to the primary thread; and a data parallel cluster controller integral to the host processor to offload processing of the one or more secondary threads to the data parallel cluster in response to one of the cores executing a parallel processing call instruction from the primary thread.