Patent search ap:("Intel Corporation") AND inv:"Kevin Nealis" Page 1

1.

发明授权
Specialized fixed function hardware for efficient convolution 有权

公开(公告)号：US12050984B2

公开(公告)日：2024-07-30

申请号：US17083080

申请日：2020-10-28

Applicant: Intel Corporation

Inventor： Rajkishore Barik , Elmoustapha Ould-Ahmed-Vall , Xiaoming Chen , Dhawal Srivastava , Anbang Yao , Kevin Nealis , Eriko Nurvitadhi , Sara S. Baghsorkhi , Balaji Vembu , Tatiana Shpeisman , Ping T. Tang

IPC: G06N3/06 , G06F9/30 , G06F9/38 , G06F16/17 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/084 , G06T1/20

CPC classification number: G06N3/063 , G06F9/3001 , G06F9/3017 , G06F9/3851 , G06F9/3887 , G06F9/3895 , G06F16/17 , G06N3/044 , G06N3/045 , G06N3/084 , G06T1/20

Abstract: One embodiment provides for a general-purpose graphics processing unit including a scheduler to schedule multiple matrix operations for execution by a general-purpose graphics processing unit. The multiple matrix operations are determined based on a single machine learning compute instruction. The single machine learning compute instruction is a convolution instruction and the multiple matrix operations are associated with a convolution operation.

2.

发明授权
Dynamic distributed training of machine learning models 有权

公开(公告)号：US11797837B2

公开(公告)日：2023-10-24

申请号：US15494971

申请日：2017-04-24

Applicant: Intel Corporation

Inventor： Altug Koker , Abhishek R. Appu , Kamal Sinha , Joydeep Ray , Balaji Vembu , Elmoustapha Ould-Ahmed-Vall , Sara S. Baghsorkhi , Anbang Yao , Kevin Nealis , Xiaoming Chen , John C. Weast , Justin E. Gottschlich , Prasoonkumar Surti , Chandrasekaran Sakthivel , Farshad Akhbari , Nadathur Rajagopalan Satish , Liwei Ma , Jeremy Bottleson , Eriko Nurvitadhi , Travis T. Schluessler , Ankur N. Shah , Jonathan Kennedy , Vasanth Ranganathan , Sanjeev Jahagirdar

IPC: G06N3/08 , G06N20/00 , G06N3/063 , G06N3/044 , G06N3/045 , G06N3/048

CPC classification number: G06N3/08 , G06N3/044 , G06N3/045 , G06N3/063 , G06N20/00 , G06N3/048

Abstract: In an example, an apparatus comprises a plurality of execution units comprising at least a first type of execution unit and a second type of execution unit and logic, at least partially including hardware logic, to analyze a workload and assign the workload to one of the first type of execution unit or the second type of execution unit. Other embodiments are also disclosed and claimed.

3.

发明公开
DYNAMIC DISTRIBUTED TRAINING OF MACHINE LEARNING MODELS 审中-公开

公开(公告)号：US20230334316A1

公开(公告)日：2023-10-19

申请号：US18314450

申请日：2023-05-09

Applicant: Intel Corporation

Inventor： Altug Koker , Abhishek R. Appu , Kamal Sinha , Joydeep Ray , Balaji Vembu , Elmoustapha Ould-Ahmed-Vall , Sara S. Baghsorkhi , Anbang Yao , Kevin Nealis , Xiaoming Chen , John C. Weast , Justin E. Gottschlich , Prasoonkumar Surti , Chandrasekaran Sakthivel , Farshad Akhbari , Nadathur Rajagopalan Satish , Liwei Ma , Jeremy Bottleson , Eriko Nurvitadhi , Travis T. Schluessler , Ankur N. Shah , Jonathan Kennedy , Vasanth Ranganathan , Sanjeev Jahagirdar

IPC: G06N3/08 , G06N20/00 , G06N3/063 , G06N3/044 , G06N3/045

CPC classification number: G06N3/08 , G06N20/00 , G06N3/063 , G06N3/044 , G06N3/045 , G06N3/048

Abstract: Described herein is a graphics processor comprising a memory device and a graphics processing cluster coupled with the memory device. The graphics processing cluster includes a plurality of graphics multiprocessors interconnected via a data interconnect. A graphics multiprocessor includes circuitry configured to load a modular neural network including a plurality of subnetworks, each of the plurality of subnetworks trained to perform a computer vision operation on a separate subject.

4.

发明授权
Specialized fixed function hardware for efficient convolution 有权

公开(公告)号：US11475286B2

公开(公告)日：2022-10-18

申请号：US17558285

申请日：2021-12-21

Applicant: Intel Corporation

Inventor： Rajkishore Barik , Elmoustapha Ould-Ahmed-Vall , Xiaoming Chen , Dhawal Srivastava , Anbang Yao , Kevin Nealis , Eriko Nurvitadhi , Sara S. Baghsorkhi , Balaji Vembu , Tatiana Shpeisman , Ping T. Tang

IPC: G06N3/06 , G06N3/063 , G06N3/04 , G06F9/30 , G06F9/38 , G06T1/20 , G06N3/08 , G06F16/17

Abstract: One embodiment provides an apparatus comprising an instruction cache to store a plurality of instructions, a scheduler unit coupled to the instruction cache, the scheduler unit to schedule the plurality of instructions for execution, an instruction fetch and decode unit to decode the plurality of instructions to determine a set of operations to perform in response, one or more compute blocks to perform parallel multiply-accumulate operations based on the instruction fetch and decode unit decoding a first instruction of the plurality of instructions, and matrix multiplication logic to perform matrix multiplication operations based on the instruction fetch and decode unit decoding a second instruction of the plurality of instructions.

5.

发明授权
Compute optimizations for neural networks using bipolar binary weight 有权

公开(公告)号：US11074072B2

公开(公告)日：2021-07-27

申请号：US16505012

申请日：2019-07-08

Applicant: Intel Corporation

Inventor： Kevin Nealis , Anbang Yao , Xiaoming Chen , Elmoustapha Ould-Ahmed-Vall , Sara S. Baghsorkhi , Eriko Nurvitadhi , Balaji Vembu , Nicolas C. Galoppo Von Borries , Rajkishore Barik , Tsung-Han Lin , Kamal Sinha

IPC: G06F9/30 , G06F9/38 , G06N3/04 , G06N3/063 , G06N3/08 , G06T1/20

Abstract: One embodiment provides for a compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction that specifies multiple operands including a multi-bit input value and a bipolar binary weight associated with a neural network and an arithmetic logic unit including a multiplier, an adder, and an accumulator register. To execute the decoded instruction, the multiplier is to perform a multiplication operation on the multi-bit input based on the bipolar binary weight to generate an intermediate product and the adder is to add the intermediate product to a value stored in the accumulator register and update the value stored in the accumulator register.

6.

发明授权
Specialized fixed function hardware for efficient convolution 有权

公开(公告)号：US10824938B2

公开(公告)日：2020-11-03

申请号：US15494723

申请日：2017-04-24

Applicant: Intel Corporation

Inventor： Rajkishore Barik , Elmoustapha Ould-Ahmed-Vall , Xiaoming Chen , Dhawal Srivastava , Anbang Yao , Kevin Nealis , Eriko Nurvitadhi , Sara S. Baghsorkhi , Balaji Vembu , Tatiana Shpeisman , Ping T. Tang

IPC: G06N3/06 , G06N3/063 , G06N3/04 , G06F9/30 , G06F9/38 , G06T1/20 , G06N3/08

Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the apparatus comprising a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to perform one or more machine learning operations, wherein the decode unit, based on parameters of the one or more machine learning operations, is to request a scheduler to schedule the one or more machine learning operations to one of an array of programmable compute units and a fixed function compute unit.

7.

发明授权
Compute optimizations for low precision machine learning operations 有权

公开(公告)号：US10242423B2

公开(公告)日：2019-03-26

申请号：US15789565

申请日：2017-10-20

Applicant: Intel Corporation

Inventor： Elmoustapha Ould-Ahmed-Vall , Sara S. Baghsorkhi , Anbang Yao , Kevin Nealis , Xiaoming Chen , Altug Koker , Abhishek R. Appu , John C. Weast , Mike B. Macpherson , Dukhwan Kim , Linda L. Hurd , Ben J. Ashbaugh , Barath Lakshmanan , Liwei Ma , Joydeep Ray , Ping T. Tang , Michael S. Strickland

IPC: G06T1/20 , G06N99/00 , G06F7/483 , G06T15/00 , G06T1/60 , G06F3/14

Abstract: One embodiment provides an accelerator module comprising a memory stack including multiple memory dies; a graphics processing unit (GPU) coupled with the memory stack via one or more memory controllers, the GPU including a plurality of multiprocessors having a single instruction, multiple thread (SIMT) architecture, the multiprocessors to execute at least one single instruction; the at least one single instruction to cause at least a portion of the GPU to perform a floating-point operation on input having differing precisions; and the floating-point operation is a two-dimensional matrix multiply and accumulate operation.

8.

发明申请
PROGRAMMABLE COARSE GRAINED AND SPARSE MATRIX COMPUTE HARDWARE WITH ADVANCED SCHEDULING 审中-公开

公开(公告)号：US20180315158A1

公开(公告)日：2018-11-01

申请号：US15581182

申请日：2017-04-28

Applicant: Intel Corporation

Inventor： Eriko Nurvitadhi , Balaji Vembu , Nicolas C. Galoppo Von Borries , Rajkishore Barik , Tsung-Han Lin , Kamal Sinha , Nadathur Rajagopalan Satish , Jeremy Bottleson , Farshad Akhbari , Altug Koker , Narayan Srinivasa , Dukhwan Kim , Sara S. Baghsorkhi , Justin E. Gottschlich , Feng Chen , Elmoustapha Ould-Ahmed-Vall , Kevin Nealis , Xiaoming Chen , Anbang Yao

IPC: G06T1/20 , G06N3/08 , G06N3/04

CPC classification number: G06T1/20 , G06F9/3001 , G06F9/3017 , G06F9/3851 , G06F9/3887 , G06F9/3895 , G06N3/0445 , G06N3/0454 , G06N3/063 , G06N3/084

Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to cause the compute apparatus to perform a complex machine learning compute operation.

9.

发明授权
Convolutional neural network optimization mechanism 有权

公开(公告)号：US12020135B2

公开(公告)日：2024-06-25

申请号：US17446101

申请日：2021-08-26

Applicant: Intel Corporation

Inventor： Liwei Ma , Elmoustapha Ould-Ahmed-Vall , Barath Lakshmanan , Ben J. Ashbaugh , Jingyi Jin , Jeremy Bottleson , Mike B. Macpherson , Kevin Nealis , Dhawal Srivastava , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Anbang Yao , Tatiana Shpeisman , Altug Koker , Abhishek R. Appu

IPC: G06N3/04 , G06N3/063 , G06N3/082 , G06T1/20 , G06N3/044 , G06N3/045

CPC classification number: G06N3/04 , G06N3/063 , G06N3/082 , G06T1/20 , G06N3/044 , G06N3/045

Abstract: A library of machine learning primitives is provided to optimize a machine learning model to improve the efficiency of inference operations. In one embodiment a trained convolutional neural network (CNN) model is processed into a trained CNN model via pruning, convolution window optimization, and quantization.

10.

发明授权
Convolutional neural network optimization mechanism 有权

公开(公告)号：US11934934B2

公开(公告)日：2024-03-19

申请号：US15488551

申请日：2017-04-17

Applicant: Intel Corporation

Inventor： Liwei Ma , Elmoustapha Ould- Ahmed-Vall , Barath Lakshmanan , Ben J. Ashbaugh , Jingyi Jin , Jeremy Bottleson , Mike B. Macpherson , Kevin Nealis , Dhawal Srivastava , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Anbang Yao , Tatiana Shpeisman , Altug Koker , Abhishek R. Appu

IPC: G06N3/04 , G06N3/063 , G06N3/082 , G06T1/20 , G06N3/044 , G06N3/045

CPC classification number: G06N3/04 , G06N3/063 , G06N3/082 , G06T1/20 , G06N3/044 , G06N3/045

Abstract: An apparatus to facilitate optimization of a convolutional neural network (CNN) is disclosed. The apparatus includes optimization logic to receive a CNN model having a list of instructions and including pruning logic to optimize the list of instructions by eliminating branches in the list of instructions that comprise a weight value of 0.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification