-
31.
公开(公告)号:US12039331B2
公开(公告)日:2024-07-16
申请号:US17967283
申请日:2022-10-17
Applicant: Intel Corporation
Inventor: Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar
IPC: G06F9/30 , G06F7/483 , G06F7/544 , G06F9/38 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/08 , G09G5/393 , G06F1/16 , G06N20/00 , G06T15/00
CPC classification number: G06F9/3001 , G06F7/483 , G06F7/5443 , G06F9/30014 , G06F9/30036 , G06F9/3851 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/08 , G09G5/393 , G06F1/16 , G06F9/30025 , G06F9/3013 , G06F2207/3824 , G06N20/00 , G06T15/005
Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute an intermediate product of 16-bit operands and to compute a 32-bit sum based on the intermediate product.
-
公开(公告)号:US20240004829A1
公开(公告)日:2024-01-04
申请号:US18350902
申请日:2023-07-12
Applicant: Intel Corporation
Inventor: Altug Koker , Farshad Akhbari , Feng Chen , Dukhwan Kim , Narayan Srinivasa , Nadathur Rajagopalan Satish , Liwei Ma , Jeremy Bottleson , Eriko Nurvitadhi , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu
IPC: G06F15/80 , G06F13/40 , G06T1/20 , G06F9/30 , G06F13/00 , G06N3/063 , G06N3/084 , G06N3/044 , G06N3/045 , G06N3/048
CPC classification number: G06F15/8007 , G06F13/4027 , G06T1/20 , G06F9/3004 , G06F13/00 , G06N3/063 , G06N3/084 , G06N3/044 , G06N3/045 , G06N3/048
Abstract: An integrated circuit (IC) package apparatus is disclosed. The IC package includes one or more processing units and a bridge, mounted below the one or more processing unit, including one or more arithmetic logic units (ALUs) to perform atomic operations.
-
33.
公开(公告)号:US11720355B2
公开(公告)日:2023-08-08
申请号:US17834482
申请日:2022-06-07
Applicant: Intel Corporation
Inventor: Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar
IPC: G06F9/30 , G09G5/393 , G06F9/38 , G06F7/483 , G06F7/544 , G06N3/063 , G06N3/08 , G06N3/044 , G06N3/045 , G06T15/00 , G06N20/00 , G06F17/16
CPC classification number: G06F9/3001 , G06F7/483 , G06F7/5443 , G06F9/30014 , G06F9/30036 , G06F9/3851 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/08 , G09G5/393 , G06F9/3013 , G06F9/30025 , G06F17/16 , G06F2207/3824 , G06N20/00 , G06T15/005
Abstract: One embodiment provides a graphics processor comprising a memory controller and a graphics processing resource coupled with the memory controller. The graphics processing resource includes circuitry configured to execute an instruction to perform a matrix operation on first input including weight data and second input including input activation data, generate intermediate data based on a result of the matrix operation, quantize the intermediate data to a floating-point format determined based on a statistical distribution of first output data, and output, as second output data, quantized intermediate data in a determined floating-point format.
-
34.
公开(公告)号:US20230046506A1
公开(公告)日:2023-02-16
申请号:US17967283
申请日:2022-10-17
Applicant: Intel Corporation
Inventor: Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar
IPC: G06F9/30 , G06F7/483 , G06N3/063 , G06N3/04 , G06F9/38 , G06N3/08 , G09G5/393 , G06F7/544 , G06T15/00 , G06N20/00 , G06F17/16
Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute an intermediate product of 16-bit operands and to compute a 32-bit sum based on the intermediate product.
-
公开(公告)号:US20220382555A1
公开(公告)日:2022-12-01
申请号:US17839856
申请日:2022-06-14
Applicant: Intel Corporation
Inventor: ELMOUSTAPHA OULD-AHMED-VALL , BARATH LAKSHMANAN , TATIANA SHPEISMAN , Joydeep Ray , Ping T. Tang , Michael Strickland , Xiaoming Chen , Anbang Yao , Ben J. Ashbaugh , Linda L. Hurd , Liwei Ma
IPC: G06F9/38 , G06F9/30 , G06F13/42 , G06F13/40 , G06N20/00 , G06T1/20 , G06N3/04 , G06N3/063 , G06N3/08 , G06N20/10 , G06F9/50 , G06F15/80 , G06N3/00
Abstract: One embodiment provides for a graphics processing unit (GPU) to accelerate machine learning operations, the GPU comprising an instruction cache to store a first instruction and a second instruction, the first instruction to cause the GPU to perform a floating-point operation, including a multi-dimensional floating-point operation, and the second instruction to cause the GPU to perform an integer operation; and a general-purpose graphics compute unit having a single instruction, multiple thread architecture, the general-purpose graphics compute unit to concurrently execute the first instruction and the second instruction.
-
公开(公告)号:US11475286B2
公开(公告)日:2022-10-18
申请号:US17558285
申请日:2021-12-21
Applicant: Intel Corporation
Inventor: Rajkishore Barik , Elmoustapha Ould-Ahmed-Vall , Xiaoming Chen , Dhawal Srivastava , Anbang Yao , Kevin Nealis , Eriko Nurvitadhi , Sara S. Baghsorkhi , Balaji Vembu , Tatiana Shpeisman , Ping T. Tang
Abstract: One embodiment provides an apparatus comprising an instruction cache to store a plurality of instructions, a scheduler unit coupled to the instruction cache, the scheduler unit to schedule the plurality of instructions for execution, an instruction fetch and decode unit to decode the plurality of instructions to determine a set of operations to perform in response, one or more compute blocks to perform parallel multiply-accumulate operations based on the instruction fetch and decode unit decoding a first instruction of the plurality of instructions, and matrix multiplication logic to perform matrix multiplication operations based on the instruction fetch and decode unit decoding a second instruction of the plurality of instructions.
-
37.
公开(公告)号:US11169799B2
公开(公告)日:2021-11-09
申请号:US16432402
申请日:2019-06-05
Applicant: Intel Corporation
Inventor: Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar
IPC: G06F9/30 , G06F9/38 , G06F7/483 , G06F7/544 , G06N3/063 , G06N20/00 , G09G5/393 , G06N3/04 , G06N3/08 , G06T15/00
Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute a 32-bit intermediate product of 16-bit operands and to compute a 32-bit sum based on the 32-bit intermediate product.
-
38.
公开(公告)号:US20210182058A1
公开(公告)日:2021-06-17
申请号:US17169232
申请日:2021-02-05
Applicant: Intel Corporation
Inventor: Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar
Abstract: A processing apparatus is provided comprising a multiprocessor having a multithreaded architecture. The multiprocessor can execute at least one single instruction to perform parallel mixed precision matrix operations. In one embodiment the apparatus includes a memory interface and an array of multiprocessors coupled to the memory interface. At least one multiprocessor in the array of multiprocessors is configured to execute a fused multiply-add instruction in parallel across multiple threads.
-
公开(公告)号:US10824938B2
公开(公告)日:2020-11-03
申请号:US15494723
申请日:2017-04-24
Applicant: Intel Corporation
Inventor: Rajkishore Barik , Elmoustapha Ould-Ahmed-Vall , Xiaoming Chen , Dhawal Srivastava , Anbang Yao , Kevin Nealis , Eriko Nurvitadhi , Sara S. Baghsorkhi , Balaji Vembu , Tatiana Shpeisman , Ping T. Tang
Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the apparatus comprising a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to perform one or more machine learning operations, wherein the decode unit, based on parameters of the one or more machine learning operations, is to request a scheduler to schedule the one or more machine learning operations to one of an array of programmable compute units and a fixed function compute unit.
-
40.
公开(公告)号:US10409614B2
公开(公告)日:2019-09-10
申请号:US15494773
申请日:2017-04-24
Applicant: Intel Corporation
Inventor: Elmoustapha Ould-Ahmed-Vall , Barath Lakshmanan , Tatiana Shpeisman , Joydeep Ray , Ping T. Tang , Michael Strickland , Xiaoming Chen , Anbang Yao , Ben J. Ashbaugh , Linda L. Hurd , Liwei Ma
IPC: G06F9/38 , G06N3/00 , G06F15/80 , G06F9/50 , G06N3/08 , G06N3/063 , G06N3/04 , G06N20/00 , G06F9/30 , G06T1/20 , G06F13/42 , G06F13/40
Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising instruction decode logic to decode a single instruction including multiple operands into a single decoded instruction, the multiple operands having differing precisions and a general-purpose graphics compute unit including a first logic unit and a second logic unit, the general-purpose graphics compute unit to execute the single decoded instruction, wherein to execute the single decoded instruction includes to perform a first instruction operation on a first set of operands of the multiple operands at a first precision and a simultaneously perform second instruction operation on a second set of operands of the multiple operands at a second precision.
-
-
-
-
-
-
-
-
-