-
公开(公告)号:US11074072B2
公开(公告)日:2021-07-27
申请号:US16505012
申请日:2019-07-08
Applicant: Intel Corporation
Inventor: Kevin Nealis , Anbang Yao , Xiaoming Chen , Elmoustapha Ould-Ahmed-Vall , Sara S. Baghsorkhi , Eriko Nurvitadhi , Balaji Vembu , Nicolas C. Galoppo Von Borries , Rajkishore Barik , Tsung-Han Lin , Kamal Sinha
Abstract: One embodiment provides for a compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction that specifies multiple operands including a multi-bit input value and a bipolar binary weight associated with a neural network and an arithmetic logic unit including a multiplier, an adder, and an accumulator register. To execute the decoded instruction, the multiplier is to perform a multiplication operation on the multi-bit input based on the bipolar binary weight to generate an intermediate product and the adder is to add the intermediate product to a value stored in the accumulator register and update the value stored in the accumulator register.
-
92.
公开(公告)号:US20210182058A1
公开(公告)日:2021-06-17
申请号:US17169232
申请日:2021-02-05
Applicant: Intel Corporation
Inventor: Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar
Abstract: A processing apparatus is provided comprising a multiprocessor having a multithreaded architecture. The multiprocessor can execute at least one single instruction to perform parallel mixed precision matrix operations. In one embodiment the apparatus includes a memory interface and an array of multiprocessors coupled to the memory interface. At least one multiprocessor in the array of multiprocessors is configured to execute a fused multiply-add instruction in parallel across multiple threads.
-
公开(公告)号:US10706498B2
公开(公告)日:2020-07-07
申请号:US16417132
申请日:2019-05-20
Applicant: Intel Corporation
Inventor: Eriko Nurvitadhi , Balaji Vembu , Tsung-Han Lin , Kamal Sinha , Rajikshore Barik , Nicolas C. Galoppo Von Borries
IPC: G06F17/16 , H03M7/30 , G06K9/62 , G06T1/20 , G06F9/30 , G06F9/38 , G06F12/0811 , G06F12/0815 , G06F12/0831 , G06F12/0888 , G06F9/48 , G06N3/04 , G06N3/08 , G06T1/60 , G06T15/00
Abstract: An apparatus to facilitate processing of a sparse matrix is disclosed. The apparatus includes a plurality of processing units each comprising one or more processing elements, including logic to read operands, a multiplication unit to multiply two or more operands and a scheduler to identify operands having a zero value and prevent scheduling of the operands having the zero value at the multiplication unit.
-
公开(公告)号:US20200210338A1
公开(公告)日:2020-07-02
申请号:US16727127
申请日:2019-12-26
Applicant: Intel Corporation
Inventor: Chandrasekaran Sakthivel , Prasoonkumar Surti , John C. Weast , Sara S. Baghsorkhi , Justin E. Gottschlich , Abhishek R. Appu , Nicolas C. Galoppo Von Borries , Joydeep Ray , Narayan Srinivasa , Feng Chen , Ben J. Ashbaugh , Rajkishore Barik , Tsung-Han Lin , Kamal Sinha , Eriko Nurvitadhi , Balaji Vembu , Altug Koker
IPC: G06F12/0837 , G06N3/08 , G06N20/00 , G06T1/20 , G06F12/0815 , G06N3/063 , G06N3/04
Abstract: In an example, an apparatus comprises a plurality of processing unit cores, a plurality of cache memory modules associated with the plurality of processing unit cores, and a machine learning model communicatively coupled to the plurality of processing unit cores, wherein the plurality of cache memory modules share cache coherency data with the machine learning model. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US20200034946A1
公开(公告)日:2020-01-30
申请号:US16531763
申请日:2019-08-05
Applicant: Intel Corporation
Inventor: Prasoonkumar Surti , Narayan Srinivasa , Feng Chen , Joydeep Ray , Ben J. Ashbaugh , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Balaji Vembu , Tsung-Han Lin , Kamal Sinha , Rajkishore Barik , Sara S. Baghsorkhi , Justin E. Gottschlich , Altug Koker , Nadathur Rajagopalan Satish , Farshad Akhbari , Dukhwan Kim , Wenyin Fu , Travis T. Schluessler , Josh B. Mastronarde , Linda L. Hurd , John H. Feit , Jeffery S. Boles , Adam T. Lake , Karthik Vaidyanathan , Devan Burke , Subramaniam Maiyuran , Abhishek R. Appu
Abstract: An apparatus to facilitate compute optimization is disclosed. The apparatus includes a memory device including a first integrated circuit (IC) including a plurality of memory channels and a second IC including a plurality of processing units, each coupled to a memory channel in the plurality of memory channels.
-
96.
公开(公告)号:US10282465B2
公开(公告)日:2019-05-07
申请号:US14311122
申请日:2014-06-20
Applicant: Intel Corporation
Inventor: Tsung-Han Lin , Hsiang-Tsung Kung
Abstract: Detailed herein are embodiments of systems, methods, and apparatuses to be used for feature searching using an entry-based searching structure.
-
97.
公开(公告)号:US20180315158A1
公开(公告)日:2018-11-01
申请号:US15581182
申请日:2017-04-28
Applicant: Intel Corporation
Inventor: Eriko Nurvitadhi , Balaji Vembu , Nicolas C. Galoppo Von Borries , Rajkishore Barik , Tsung-Han Lin , Kamal Sinha , Nadathur Rajagopalan Satish , Jeremy Bottleson , Farshad Akhbari , Altug Koker , Narayan Srinivasa , Dukhwan Kim , Sara S. Baghsorkhi , Justin E. Gottschlich , Feng Chen , Elmoustapha Ould-Ahmed-Vall , Kevin Nealis , Xiaoming Chen , Anbang Yao
CPC classification number: G06T1/20 , G06F9/3001 , G06F9/3017 , G06F9/3851 , G06F9/3887 , G06F9/3895 , G06N3/0445 , G06N3/0454 , G06N3/063 , G06N3/084
Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to cause the compute apparatus to perform a complex machine learning compute operation.
-
公开(公告)号:US20180174053A1
公开(公告)日:2018-06-21
申请号:US15385031
申请日:2016-12-20
Applicant: Intel Corporation
Inventor: Tsung-Han Lin
Abstract: A spiking neural network (SNN) is implemented on a neuromorphic computers and includes a plurality of neurons, a first set of the plurality of synapses defining feed-forward connections from a first subset of the neurons to a second subset of the neurons, a second subset of the plurality of synapses to define recurrent connections between the second subset of neurons, and a third subset of the plurality of synapses to define feedback connections from the second subset of neurons to the first subset of neurons. A set of input vectors are provided to iteratively modify weight values of the plurality of synapses. Each iteration involves selectively enabling and disabling the third subset of synapses with a different one of the input vectors applied to the SNN. The weight values are iteratively adjusted to derive a solution to an equation comprising an unknown matrix variable and an unknown vector variable.
-
公开(公告)号:US20170091655A1
公开(公告)日:2017-03-30
申请号:US14865124
申请日:2015-09-25
Applicant: Intel Corporation
Inventor: Tsung-Han Lin , Gokce Keskin , Hsiang-Tsung Kung , She-Hwa Yen , Hong Wang
IPC: G06N99/00
CPC classification number: G06N20/00 , G06F9/3836 , G06F15/76
Abstract: A processor includes a front end to decode an instruction, an allocator to pass the instruction to a nearest neighbor logic unit (NNLU) to execute the instruction, and a retirement unit to retire the instruction. The NNLU includes logic to determine input of the instruction for which nearest neighbors will be calculated, transform the input, retrieve candidate atoms for which the nearest neighbors will be calculated, compute distance between the candidate atoms and the input, and determine the nearest neighbors for the input based upon the computed distance.
-
-
-
-
-
-
-
-