-
公开(公告)号:US20180307950A1
公开(公告)日:2018-10-25
申请号:US15494710
申请日:2017-04-24
Applicant: Intel Corporation
Inventor: Kevin Nealis , Anbang Yao , Xiaoming Chen , Elmoustapha Ould-Ahmed-Vall , Sara S. Baghsorkhi , Eriko Nurvitadhi , Balaji Vembu , Nicolas C. Galoppo Von Borries , Rajkishore Barik , Tsung-Han Lin , Kamal Sinha
CPC classification number: G06K9/66 , G06F9/3001 , G06F9/3851 , G06F9/3887 , G06F9/3893 , G06F2207/4824 , G06K9/00973 , G06N3/0445 , G06N3/0454 , G06N3/063 , G06N3/084 , G06T1/20
Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the apparatus comprising a decode unit to decode a single instruction into a decoded instruction that specifies multiple operands including an input value and a quantized weight value associated with a neural network and an arithmetic logic unit including a barrel shifter, an adder, and an accumulator register, wherein to execute the decoded instruction, the barrel shifter is to shift the input value by the quantized weight value to generate a shifted input value and the adder is to add the shifted input value to a value stored in the accumulator register and update the value stored in the accumulator register.
-
公开(公告)号:US20180293691A1
公开(公告)日:2018-10-11
申请号:US15482791
申请日:2017-04-09
Applicant: Intel Corporation
Inventor: Eriko Nurvitadhi , Balaji Vembu , Tsung-Han Lin , Kamal Sinha , Rajkishore Barik , Nicolas C. Galoppo Von Borries
CPC classification number: G06T1/20 , G06F9/3001 , G06F9/3885 , G06F9/4881 , G06F12/0811 , G06F12/0815 , G06F12/0831 , G06F12/0888 , G06F17/16 , G06F2212/1024 , G06F2212/302 , G06F2212/401 , G06F2212/621 , G06N3/04 , G06N3/08 , G06T1/60 , G06T15/005 , G06T2200/28
Abstract: An apparatus to facilitate processing of a sparse matrix is disclosed. The apparatus includes a plurality of processing units each comprising one or more processing elements, including logic to read operands, a multiplication unit to multiply two or more operands and a scheduler to identify operands having a zero value and prevent scheduling of the operands having the zero value at the multiplication unit.
-
公开(公告)号:US20180174024A1
公开(公告)日:2018-06-21
申请号:US15385504
申请日:2016-12-20
Applicant: Intel Corporation
Inventor: Tsung-Han Lin , Narayan Srinivasa
CPC classification number: G06N3/049 , G06F17/16 , G06N3/0454 , G06N3/063 , G06N3/0635 , G06N3/08
Abstract: A spiking neural network (SNN) is defined that includes artificial neurons interconnected by artificial synapses, the SNN defined to correspond to one or more numerical matrices in an equation such that weight values of the synapses correspond to values in the numerical matrices. An input vector is provided to the SNN to correspond to a numerical vector in the equation. A steady state spiking rate is determined for at least a portion of the neurons in the SNN and an approximate result of a matrix inverse problem corresponding to the equation is determined based on values of the steady state spiking rates.
-
公开(公告)号:US12210900B2
公开(公告)日:2025-01-28
申请号:US17746201
申请日:2022-05-17
Applicant: Intel Corporation
Inventor: Joydeep Ray , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Rajkishore Barik , Eriko Nurvitadhi , Nicolas Galoppo Von Borries , Tsung-Han Lin , Sanjeev Jahagirdar , Vasanth Ranganathan
Abstract: A mechanism is described for facilitating intelligent thread scheduling at autonomous machines. A method of embodiments, as described herein, includes detecting dependency information relating to a plurality of threads corresponding to a plurality of workloads associated with tasks relating to a processor including a graphics processor. The method may further include generating a tree of thread groups based on the dependency information, where each thread group includes multiple threads, and scheduling one or more of the thread groups associated a similar dependency to avoid dependency conflicts.
-
公开(公告)号:US20240257294A1
公开(公告)日:2024-08-01
申请号:US18436494
申请日:2024-02-08
Applicant: Intel Corporation
Inventor: Prasoonkumar Surti , Narayan Srinivasa , Feng Chen , Joydeep Ray , Ben J. Ashbaugh , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Balaji Vembu , Tsung-Han Lin , Kamal Sinha , Rajkishore Barik , Sara S. Baghsorkhi , Justin E. Gottschlich , Altug Koker , Nadathur Rajagopalan Satish , Farshad Akhbari , Dukhwan Kim , Wenyin Fu , Travis T. Schluessler , Josh B. Mastronarde , Linda L. Hurd , John H. Feit , Jeffery S. Boles , Adam T. Lake , Karthik Vaidyanathan , Devan Burke , Subramaniam Maiyuran , Abhishek R. Appu
CPC classification number: G06T1/20 , G06F9/45533 , G06F9/5061 , G06F9/5094 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/084 , G06F8/41 , G06F2009/45583
Abstract: Embodiments provide mechanisms to facilitate compute operations for deep neural networks. One embodiment comprises a graphics processing unit comprising one or more multiprocessors, at least one of the one or more multiprocessors including a register file to store a plurality of different types of operands and a plurality of processing cores. The plurality of processing cores includes a first set of processing cores of a first type and a second set of processing cores of a second type. The first set of processing cores are associated with a first memory channel and the second set of processing cores are associated with a second memory channel.
-
76.
公开(公告)号:US12039331B2
公开(公告)日:2024-07-16
申请号:US17967283
申请日:2022-10-17
Applicant: Intel Corporation
Inventor: Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar
IPC: G06F9/30 , G06F7/483 , G06F7/544 , G06F9/38 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/08 , G09G5/393 , G06F1/16 , G06N20/00 , G06T15/00
CPC classification number: G06F9/3001 , G06F7/483 , G06F7/5443 , G06F9/30014 , G06F9/30036 , G06F9/3851 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/08 , G09G5/393 , G06F1/16 , G06F9/30025 , G06F9/3013 , G06F2207/3824 , G06N20/00 , G06T15/005
Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute an intermediate product of 16-bit operands and to compute a 32-bit sum based on the intermediate product.
-
公开(公告)号:US20240078629A1
公开(公告)日:2024-03-07
申请号:US18466991
申请日:2023-09-14
Applicant: Intel Corporation
Inventor: Eriko Nurvitadhi , Balaji Vembu , Tsung-Han Lin , Kamal Sinha , Rajkishore Barik , Nicolas C. Galoppo Von Borries
IPC: G06T1/20 , G06F9/30 , G06F9/38 , G06F9/48 , G06F12/02 , G06F12/0811 , G06F12/0815 , G06F12/0831 , G06F12/0888 , G06F17/16 , G06F18/2136 , G06N3/04 , G06N3/08 , G06N20/00 , G06T1/60 , G06T15/00 , H03M7/30
CPC classification number: G06T1/20 , G06F9/3001 , G06F9/3885 , G06F9/4881 , G06F12/0207 , G06F12/0811 , G06F12/0815 , G06F12/0831 , G06F12/0888 , G06F17/16 , G06F18/2136 , G06N3/04 , G06N3/08 , G06N20/00 , G06T1/60 , G06T15/005 , H03M7/30 , G06F2212/1024 , G06F2212/302 , G06F2212/401 , G06F2212/621 , G06T2200/28
Abstract: Techniques to improve performance of matrix multiply operations are described in which a compute kernel can specify one or more element-wise operations to perform on output of the compute kernel before the output is transferred to higher levels of a processor memory hierarchy.
-
公开(公告)号:US11922535B2
公开(公告)日:2024-03-05
申请号:US18168207
申请日:2023-02-13
Applicant: Intel Corporation
Inventor: Prasoonkumar Surti , Narayan Srinivasa , Feng Chen , Joydeep Ray , Ben J. Ashbaugh , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Balaji Vembu , Tsung-Han Lin , Kamal Sinha , Rajkishore Barik , Sara S. Baghsorkhi , Justin E. Gottschlich , Altug Koker , Nadathur Rajagopalan Satish , Farshad Akhbari , Dukhwan Kim , Wenyin Fu , Travis T. Schluessler , Josh B. Mastronarde , Linda L. Hurd , John H. Feit , Jeffery S. Boles , Adam T. Lake , Karthik Vaidyanathan , Devan Burke , Subramaniam Maiyuran , Abhishek R. Appu
CPC classification number: G06T1/20 , G06F9/45533 , G06F9/5061 , G06F9/5094 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/084 , G06F8/41 , G06F2009/45583
Abstract: Embodiments provide mechanisms to facilitate compute operations for deep neural networks. One embodiment comprises a graphics processing unit comprising one or more multiprocessors, at least one of the one or more multiprocessors including a register file to store a plurality of different types of operands and a plurality of processing cores. The plurality of processing cores includes a first set of processing cores of a first type and a second set of processing cores of a second type. The first set of processing cores are associated with a first memory channel and the second set of processing cores are associated with a second memory channel.
-
公开(公告)号:US11803935B2
公开(公告)日:2023-10-31
申请号:US17881720
申请日:2022-08-05
Applicant: Intel Corporation
Inventor: Eriko Nurvitadhi , Balaji Vembu , Tsung-Han Lin , Kamal Sinha , Rajkishore Barik , Nicolas C. Galoppo Von Borries
IPC: G06F17/16 , G06T1/20 , G06F9/30 , G06F9/38 , G06F12/0811 , G06F12/0815 , G06F12/0831 , G06F12/0888 , H03M7/30 , G06N20/00 , G06F12/02 , G06F18/2136 , G06F9/48 , G06N3/04 , G06N3/08 , G06T1/60 , G06T15/00
CPC classification number: G06T1/20 , G06F9/3001 , G06F9/3885 , G06F9/4881 , G06F12/0207 , G06F12/0811 , G06F12/0815 , G06F12/0831 , G06F12/0888 , G06F17/16 , G06F18/2136 , G06N3/04 , G06N3/08 , G06N20/00 , G06T1/60 , G06T15/005 , H03M7/30 , G06F2212/1024 , G06F2212/302 , G06F2212/401 , G06F2212/621 , G06T2200/28
Abstract: Techniques to improve performance of matrix multiply operations are described in which a compute kernel can specify one or more element-wise operations to perform on output of the compute kernel before the output is transferred to higher levels of a processor memory hierarchy.
-
公开(公告)号:US20230260072A1
公开(公告)日:2023-08-17
申请号:US18168207
申请日:2023-02-13
Applicant: Intel Corporation
Inventor: Prasoonkumar Surti , Narayan Srinivasa , Feng Chen , Joydeep Ray , Ben J. Ashbaugh , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Balaji Vembu , Tsung-Han Lin , Kamal Sinha , Rajkishore Barik , Sara S. Baghsorkhi , Justin E. Gottschlich , Altug Koker , Nadathur Rajagopalan Satish , Farshad Akhbari , Dukhwan Kim , Wenyin Fu , Travis T. Schluessler , Josh B. Mastronarde , Linda L. Hurd , John H. Feit , Jeffery S. Boles , Adam T. Lake , Karthik Vaidyanathan , Devan Burke , Subramaniam Maiyuran , Abhishek R. Appu
CPC classification number: G06T1/20 , G06F9/45533 , G06F9/5061 , G06F9/5094 , G06N3/063 , G06N3/084 , G06N3/044 , G06N3/045 , G06F8/41
Abstract: Embodiments provide mechanisms to facilitate compute operations for deep neural networks. One embodiment comprises a graphics processing unit comprising one or more multiprocessors, at least one of the one or more multiprocessors including a register file to store a plurality of different types of operands and a plurality of processing cores. The plurality of processing cores includes a first set of processing cores of a first type and a second set of processing cores of a second type. The first set of processing cores are associated with a first memory channel and the second set of processing cores are associated with a second memory channel.
-
-
-
-
-
-
-
-
-