Patent search ap:("Intel Corporation") AND inv:"Deborah Marr" Page 1

1.

发明授权
Machine learning sparse computation mechanism for arbitrary neural networks, arithmetic compute microarchitecture, and sparsity for training mechanism 有权

公开(公告)号：US12014265B2

公开(公告)日：2024-06-18

申请号：US18302889

申请日：2023-04-19

Applicant: Intel Corporation

Inventor： Eriko Nurvitadhi , Amit Bleiweiss , Deborah Marr , Eugene Wang , Saritha Dwarakapuram , Sabareesh Ganapathy

IPC: G06T1/20 , G06F7/52 , G06F16/901 , G06F17/16 , G06F18/214 , G06N3/04 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/08 , G06N3/084 , G06N20/00 , G06T15/00 , G06N3/047

CPC classification number: G06N3/063 , G06F7/52 , G06F16/9024 , G06F17/16 , G06F18/214 , G06N3/04 , G06N3/044 , G06N3/045 , G06N3/08 , G06N3/084 , G06N20/00 , G06T1/20 , G06T15/005 , G06N3/047

Abstract: An apparatus to facilitate processing of a sparse matrix for arbitrary graph data is disclosed. The apparatus includes a graphics processing unit having a data management unit (DMU) that includes a scheduler for scheduling matrix operations, an active logic for tracking active input operands, and a skip logic for tracking unimportant input operands to be skipped by the scheduler. Processing circuitry is coupled to the DMU. The processing circuitry comprises a plurality of processing elements including logic to read operands and a multiplication unit to multiply two or more operands for the arbitrary graph data and customizable circuitry to provide custom functions.

2.

发明公开
Memory-Size- and Bandwidth-Efficient Method for Feeding Systolic Array Matrix Multipliers 审中-公开

公开(公告)号：US20230359695A1

公开(公告)日：2023-11-09

申请号：US18222989

申请日：2023-07-17

Applicant: Intel Corporation

Inventor： Jack Z. Yinger , Andrew Ling , Tomasz Czajkowski , Davor Capalija , Eriko Nurvitadhi , Deborah Marr

IPC: G06F17/16 , G06F7/544

CPC classification number: G06F17/16 , G06F7/5443 , G06F2207/3892

Abstract: Matrix multiplication systolic array feed methods and related processing element (PE) microarchitectures for efficiently implementing systolic array generic matrix multiplier (SGEMM) in integrated circuits is provided. A systolic array architecture may include a processing element array, a column feeder array, and a row feeder array. A bandwidth of external memory may be reduced by a factor of reduction based on interleaving of the matrix data via a feeding pattern of the column feeder array and the row feeder array.

3.

发明授权
Method and system for efficient floating-point compression 有权

公开(公告)号：US11416248B2

公开(公告)日：2022-08-16

申请号：US16833597

申请日：2020-03-28

Applicant: Intel Corporation

Inventor： Jaewoong Sim , Alaa Alameldeen , Eriko Nurvitadhi , Deborah Marr

IPC: G06F9/30 , G06F9/38 , G06F7/485 , G06F7/556

Abstract: An apparatus and method for compressing floating-point values. For example, one embodiment of a processor comprises: instruction fetch circuitry to fetch instructions from a memory, the instructions including floating-point instructions; execution circuitry to execute the floating-point instructions, each floating-point instruction having one or more floating-point operands, each floating-point operand comprising an exponent value and a significand value; floating-point compression circuitry to compress a plurality of the exponent values associated with a corresponding plurality of the floating-point operands, the floating-point compression circuitry comprising: base generation circuitry to evaluate the plurality of the exponent values to generate a first base value; and delta generation circuitry to determine a difference between the plurality of exponent values and the first base value and to generate a corresponding first plurality of delta values, wherein the floating-point compression circuitry is to store the first base value and the corresponding first plurality of delta values as a plurality of compressed exponent values.

4.

发明申请
Memory-Size- and Bandwidth-Efficient Method for Feeding Systolic Array Matrix Multipliers 审中-公开

公开(公告)号：US20190012295A1

公开(公告)日：2019-01-10

申请号：US15644526

申请日：2017-07-07

Applicant: Intel Corporation

Inventor： Jack Z. Yinger , Andrew Ling , Tomasz Czajkowski , Davor Capalija , Eriko Nurvitadhi , Deborah Marr

IPC: G06F17/16 , G06F7/544

Abstract: Matrix multiplication systolic array feed methods and related processing element (PE) microarchitectures for efficiently implementing systolic array generic matrix multiplier (SGEMM) in integrated circuits is provided. A systolic array architecture may include a processing element array, a column feeder array, and a row feeder array. A bandwidth of external memory may be reduced by a factor of reduction based on interleaving of the matrix data via a feeding pattern of the column feeder array and the row feeder array.

5.

发明授权
Machine learning accelerator mechanism 有权

公开(公告)号：US12039435B2

公开(公告)日：2024-07-16

申请号：US17845794

申请日：2022-06-21

Applicant: Intel Corporation

Inventor： Amit Bleiweiss , Anavai Ramesh , Asit Mishra , Deborah Marr , Jeffrey Cook , Srinivas Sridharan , Eriko Nurvitadhi , Elmoustapha Ould-Ahmed-Vall , Dheevatsa Mudigere , Mohammad Ashraf Bhuiyan , Md Faijul Amin , Wei Wang , Dhawal Srivastava , Niharika Maheshwari

IPC: G06N3/063 , G06F7/78 , G06F9/00 , G06N3/084 , G06N20/00 , G06T1/20

CPC classification number: G06N3/063 , G06F7/78 , G06F9/00 , G06N3/084 , G06N20/00 , G06F2207/4824 , G06T1/20

Abstract: An apparatus to facilitate acceleration of machine learning operations is disclosed. The apparatus comprises at least one processor to perform operations to implement a neural network and accelerator logic to perform communicatively coupled to the processor to perform compute operations for the neural network.

6.

发明申请
Memory-Size- and Bandwidth-Efficient Method for Feeding Systolic Array Matrix Multipliers 有权

公开(公告)号：US20230064381A1

公开(公告)日：2023-03-02

申请号：US17740057

申请日：2022-05-09

Applicant: Intel Corporation

Inventor： Jack Z. Yinger , Andrew Ling , Tomasz Czajkowski , Davor Capalija , Eriko Nurvitadhi , Deborah Marr

IPC: G06F17/16 , G06F7/544

Abstract: Matrix multiplication systolic array feed methods and related processing element (PE) microarchitectures for efficiently implementing systolic array generic matrix multiplier (SGEMM) in integrated circuits is provided. A systolic array architecture may include a processing element array, a column feeder array, and a row feeder array. A bandwidth of external memory may be reduced by a factor of reduction based on interleaving of the matrix data via a feeding pattern of the column feeder array and the row feeder array.

7.

发明授权
Apparatus and method for a high throughput parallel co-processor and interconnect with low offload latency 有权

公开(公告)号：US10915328B2

公开(公告)日：2021-02-09

申请号：US16220528

申请日：2018-12-14

Applicant: Intel Corporation

Inventor： Jonathan Pearce , David Sheffield , Srikanth Srinivasan , Jeffrey Cook , Deborah Marr

IPC: G06F9/38 , G06F9/30 , G06F9/28 , G06F13/40

Abstract: An apparatus and method for offloading iterative, parallel work to a data parallel cluster. For example, one embodiment of a processor comprises: a host processor to execute a primary thread; a data parallel cluster coupled to the host processor over a high speed interconnect, the data parallel cluster comprising a plurality of execution lanes to perform parallel execution of one or more secondary threads related to the primary thread; and a data parallel cluster controller integral to the host processor to offload processing of the one or more secondary threads to the data parallel cluster in response to one of the cores executing a parallel processing call instruction from the primary thread.

8.

发明授权
Hardware accelerator architecture for processing very-sparse and hyper-sparse matrix data 有权

公开(公告)号：US10146738B2

公开(公告)日：2018-12-04

申请号：US15396511

申请日：2016-12-31

Applicant: Intel Corporation

Inventor： Eriko Nurvitadhi , Deborah Marr

IPC: G06F15/80 , G06F9/30 , G06F17/16 , G11C7/10

Abstract: An accelerator architecture for processing very-sparse and hyper-sparse matrix data is disclosed. A hardware accelerator comprises one or more tiles, each including a plurality of processing elements (PEs) and a data management unit (DMU). The PEs are to perform matrix operations involving very- or hyper-sparse matrices that are stored by a memory. The DMU is to provide the plurality of PEs access to the memory via an interface that is optimized to provide low-latency, parallel, random accesses to the memory. The PEs, via the DMU, perform the matrix operations by, issuing random access read requests for values of the one or more matrices, issuing random access read requests for values of one or more vectors serving as a second operand, and issuing random access write requests for values of one or more vectors serving as a result.

9.

发明申请
MACHINE LEARNING ACCELERATOR MECHANISM 有权

公开(公告)号：US20240403620A1

公开(公告)日：2024-12-05

申请号：US18679802

申请日：2024-05-31

Applicant: Intel Corporation

Inventor： Amit Bleiweiss , Anavai Ramesh , Asit Mishra , Deborah Marr , Jeffrey Cook , Srinivas Sridharan , Eriko Nurvitadhi , Elmoustapha Ould-Ahmed-Vall , Dheevatsa Mudigere , Mohammad Ashraf Bhuiyan , Md Faijul Amin , Wei Wang , Dhawal Srivastava , Niharika Maheshwari

IPC: G06N3/063 , G06F7/78 , G06F9/00 , G06N3/084 , G06N20/00 , G06T1/20

Abstract: An apparatus to facilitate acceleration of machine learning operations is disclosed. The apparatus comprises at least one processor to perform operations to implement a neural network and accelerator logic to perform communicatively coupled to the processor to perform compute operations for the neural network.

10.

发明申请
MACHINE LEARNING ACCELERATOR MECHANISM 有权

公开(公告)号：US20230053289A1

公开(公告)日：2023-02-16

申请号：US17845794

申请日：2022-06-21

Applicant: Intel Corporation

Inventor： Amit Bleiweiss , Anavai Ramesh , Asit Mishra , Deborah Marr , Jeffrey Cook , Srinivas Sridharan , Eriko Nurvitadhi , Elmoustapha Ould-Ahmed-Vall , Dheevatsa Mudigere , Mohammad Ashraf Bhuiyan , Md Faijul Amin , Wei Wang , Dhawal Srivastava , Niharika Maheshwari

IPC: G06N3/063 , G06F7/78 , G06N3/08 , G06N20/00 , G06F9/00

Abstract: An apparatus to facilitate acceleration of machine learning operations is disclosed. The apparatus comprises at least one processor to perform operations to implement a neural network and accelerator logic to perform communicatively coupled to the processor to perform compute operations for the neural network.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification