-
公开(公告)号:US20180307494A1
公开(公告)日:2018-10-25
申请号:US15494773
申请日:2017-04-24
Applicant: Intel Corporation
Inventor: ELMOUSTAPHA OULD-AHMED-VALL , BARATH LAKSHMANAN , TATIANA SHPEISMAN , Joydeep Ray , Ping T. Tang , Michael Strickland , Xiaoming Chen , Anbang Yao , Ben J. Ashbaugh , Linda L. Hurd , Liwei Ma
CPC classification number: G06F9/3887 , G06F1/32 , G06F9/3001 , G06F9/30014 , G06F9/30036 , G06F9/30094 , G06F9/30109 , G06F9/30112 , G06F9/3016 , G06F9/3851 , G06F9/3891 , G06F9/50 , G06F13/4068 , G06F13/4282 , G06F15/80 , G06F2213/0026 , G06N3/00 , G06N3/0445 , G06N3/0454 , G06N3/063 , G06N3/084 , G06N20/00 , G06T1/20
Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising instruction decode logic to decode a single instruction including multiple operands into a single decoded instruction, the multiple operands having differing precisions and a general-purpose graphics compute unit including a first logic unit and a second logic unit, the general-purpose graphics compute unit to execute the single decoded instruction, wherein to execute the single decoded instruction includes to perform a first instruction operation on a first set of operands of the multiple operands at a first precision and a simultaneously perform second instruction operation on a second set of operands of the multiple operands at a second precision.
-
公开(公告)号:US20250005703A1
公开(公告)日:2025-01-02
申请号:US18773094
申请日:2024-07-15
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Altug Koker , Linda L. Hurd , Dukhwan Kim , Mike B. Macpherson , John C. Weast , Feng Chen , Farshad Akhbari , Narayan Srinivasa , Nadathur Rajagopalan Satish , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Anbang Yao , Tatiana Shpeisman
IPC: G06T1/20 , G06F3/14 , G06F9/30 , G06F9/38 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/084 , G06T15/00 , G06T15/04 , G09G5/36
Abstract: An apparatus to facilitate compute optimization is disclosed. The apparatus includes a mixed precision core including mixed-precision execution circuitry to execute one or more of the mixed-precision instructions to perform a mixed-precision dot-product operation comprising to perform a set of multiply and accumulate operations.
-
63.
公开(公告)号:US20240184572A1
公开(公告)日:2024-06-06
申请号:US18528340
申请日:2023-12-04
Applicant: Intel Corporation
Inventor: Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar
IPC: G06F9/30 , G06F7/483 , G06F7/544 , G06F9/38 , G06F17/16 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/08 , G06N20/00 , G06T15/00 , G09G5/393
CPC classification number: G06F9/3001 , G06F7/483 , G06F7/5443 , G06F9/30014 , G06F9/30036 , G06F9/3851 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/08 , G09G5/393 , G06F9/30025 , G06F9/3013 , G06F17/16 , G06F2207/3824 , G06N20/00 , G06T15/005
Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute an intermediate product of 16-bit operands and to compute a 32-bit sum based on the intermediate product.
-
公开(公告)号:US11948224B2
公开(公告)日:2024-04-02
申请号:US17978573
申请日:2022-11-01
Applicant: Intel Corporation
Inventor: Elmoustapha Ould-Ahmed-Vall , Sara S. Baghsorkhi , Anbang Yao , Kevin Nealis , Xiaoming Chen , Altug Koker , Abhishek R. Appu , John C. Weast , Mike B. Macpherson , Dukhwan Kim , Linda L. Hurd , Ben J. Ashbaugh , Barath Lakshmanan , Liwei Ma , Joydeep Ray , Ping T. Tang , Michael S. Strickland
IPC: G06T1/20 , G06F3/14 , G06F7/483 , G06F9/30 , G06F9/38 , G06F9/50 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/08 , G06N3/084 , G06N20/00 , G06T1/60 , G06T15/00
CPC classification number: G06T1/20 , G06F7/483 , G06F9/30014 , G06F9/30185 , G06F9/3863 , G06F9/5044 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/084 , G06N20/00 , G06F3/14 , G06T1/60 , G06T15/005
Abstract: One embodiment provides an apparatus comprising a memory stack including multiple memory dies and a parallel processor including a plurality of multiprocessors. Each multiprocessor has a single instruction, multiple thread (SIMT) architecture, the parallel processor coupled to the memory stack via one or more memory interfaces. At least one multiprocessor comprises a multiply-accumulate circuit to perform multiply-accumulate operations on matrix data in a stage of a neural network implementation to produce a result matrix comprising a plurality of matrix data elements at a first precision, precision tracking logic to evaluate metrics associated with the matrix data elements and indicate if an optimization is to be performed for representing data at a second stage of the neural network implementation, and a numerical transform unit to dynamically perform a numerical transform operation on the matrix data elements based on the indication to produce transformed matrix data elements at a second precision.
-
公开(公告)号:US20240005136A1
公开(公告)日:2024-01-04
申请号:US18351124
申请日:2023-07-12
Applicant: Intel Corporation
Inventor: Kamal Sinha , Balaji Vembu , Eriko Nurvitadhi , Nicolas C. Galoppo Von Borries , Rajkishore Barik , Tsung-Han Lin , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Anbang Yao , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Farshad Akhbari , Narayan Srinivasa , Feng Chen , Dukhwan Kim , Nadathur Rajagopalan Satish , John C. Weast , Mike B. MacPherson , Linda L. Hurd , Vasanth Ranganathan , Sanjeev Jahagirdar
IPC: G06N3/063 , G06N3/08 , G06N3/04 , G06T1/20 , G06F9/30 , G06T15/00 , G06F15/78 , G06F15/76 , G06F1/3287 , G06F1/3293 , G06N3/084 , G06N3/044 , G06N3/045
CPC classification number: G06N3/063 , G06N3/08 , G06N3/04 , G06T1/20 , G06F9/30014 , G06T15/005 , G06F15/78 , G06F15/76 , G06F9/30036 , G06F1/3287 , G06F1/3293 , G06N3/084 , G06N3/044 , G06N3/045 , G06T1/60
Abstract: In an example, an apparatus comprises a compute engine comprising a high precision component and a low precision component; and logic, at least partially including hardware logic, to receive instructions in the compute engine; select at least one of the high precision component or the low precision component to execute the instructions; and apply a gate to at least one of the high precision component or the low precision component to execute the instructions. Other embodiments are also disclosed and claimed.
-
66.
公开(公告)号:US20230315481A1
公开(公告)日:2023-10-05
申请号:US18312079
申请日:2023-05-04
Applicant: Intel Corporation
Inventor: ELMOUSTAPHA OULD-AHMED-VALL , BARATH LAKSHMANAN , TATIANA SHPEISMAN , Joydeep Ray , Ping T. Tang , Michael Strickland , Xiaoming Chen , Anbang Yao , Ben J. Ashbaugh , Linda L. Hurd , Liwei Ma
IPC: G06F9/38 , G06F9/30 , G06F13/42 , G06F13/40 , G06N20/00 , G06T1/20 , G06N3/063 , G06N3/084 , G06N20/10 , G06N3/044 , G06N3/045 , G06F9/50 , G06F15/80 , G06N3/00
CPC classification number: G06F9/3887 , G06F9/3001 , G06F9/30014 , G06F9/30036 , G06F9/30094 , G06F9/30109 , G06F9/30112 , G06F9/3016 , G06F9/3851 , G06F9/3891 , G06F9/50 , G06F13/4068 , G06F13/4282 , G06F15/80 , G06N3/00 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/084 , G06N20/00 , G06N20/10 , G06T1/20 , G06F2213/0026
Abstract: Described herein is a general-purpose graphics processing unit including a multiprocessor having a single instruction, multiple thread, SIMT, architecture. The multiprocessor comprises multiple sets of compute units each having a first logic unit configured to perform floating-point operations and a second logic unit configured to perform integer operations, with a thread of the floating-point instruction being executed in parallel with a thread of the integer instruction.
-
公开(公告)号:US11727246B2
公开(公告)日:2023-08-15
申请号:US16283021
申请日:2019-02-22
Applicant: Intel Corporation
Inventor: Liwei Ma , Elmoustapha Ould-Ahmed-Vall , Barath Lakshmanan , Ben J. Ashbaugh , Jingyi Jin , Jeremy Bottleson , Mike B. Macpherson , Kevin Nealis , Dhawal Srivastava , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Anbang Yao , Tatiana Shpeisman , Altug Koker , Abhishek R. Appu
Abstract: Embodiments provide systems and methods which facilitate optimization of a convolutional neural network (CNN). One embodiment provides for a non-transitory machine-readable medium storing instructions that cause one or more processors to perform operations comprising processing a trained convolutional neural network (CNN) to generate a processed CNN, the trained CNN having weights in a floating-point format. Processing the trained CNN includes quantizing the weights in the floating-point format to generate weights in an integer format. Quantizing the weights includes generating a quantization table to enable non-uniform quantization of the weights and quantizing the weights from the floating-point format to the integer format using the quantization table. The operations additionally comprise performing an inference operation utilizing the processed CNN with the integer format weights.
-
公开(公告)号:US20230061331A1
公开(公告)日:2023-03-02
申请号:US17960611
申请日:2022-10-05
Applicant: Intel Corporation
Inventor: Elmoustapha Ould-Ahmed-Vall , Sara S. Baghsorkhi , Anbang Yao , Kevin Nealis , Xiaoming Chen , Altug Koker , Abhishek R. Appu , John C. Weast , Mike B. Macpherson , Dukhwan Kim , Linda L. Hurd , Ben J. Ashbaugh , Barath Lakshmanan , Liwei Ma , Joydeep Ray , Ping T. Tang , Michael S. Strickland
IPC: G06T1/20 , G06F7/483 , G06N3/08 , G06F9/30 , G06N3/04 , G06N3/063 , G06F9/50 , G06F9/38 , G06N20/00
Abstract: One embodiment provides a multi-chip module accelerator usable to execute tensor data processing operations a multi-chip module. The multi-chip module may include a memory stack including multiple memory dies and parallel processor circuitry communicatively coupled to the memory stack. The parallel processor circuitry may include multiprocessor cores to execute matrix multiplication and accumulate operations. The matrix multiplication and accumulate operations may include floating-point operations that are configurable to include two-dimensional matrix multiply and accumulate operations involving inputs that have differing floating-point precisions. The floating-point operations may include a first operation at a first precision and a second operation at a second precision. The first operation may include a multiply having at least one 16-bit floating-point input and the second operation may include an accumulate having a 32-bit floating-point input.
-
69.
公开(公告)号:US11360767B2
公开(公告)日:2022-06-14
申请号:US17305355
申请日:2021-07-06
Applicant: Intel Corporation
Inventor: Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar
IPC: G06F9/30 , G09G5/393 , G06F9/38 , G06F7/483 , G06F7/544 , G06N3/04 , G06N3/063 , G06N3/08 , G06T15/00 , G06N20/00 , G06F17/16
Abstract: A processing apparatus is provided comprising a multiprocessor having a multithreaded architecture. The multiprocessor can execute at least one single instruction to perform parallel mixed precision matrix operations. In one embodiment the apparatus includes a memory interface and an array of multiprocessors coupled to the memory interface. At least one multiprocessor in the array of multiprocessors is configured to execute a fused multiply-add instruction in parallel across multiple threads.
-
公开(公告)号:US20220114430A1
公开(公告)日:2022-04-14
申请号:US17558285
申请日:2021-12-21
Applicant: Intel Corporation
Inventor: Rajkishore Barik , Elmoustapha Ould-Ahmed-Vall , Xiaoming Chen , Dhawal Srivastava , Anbang Yao , Kevin Nealis , Eriko Nurvitadhi , Sara S. Baghsorkhi , Balaji Vembu , Tatiana Shpeisman , Ping T. Tang
Abstract: One embodiment provides an apparatus comprising an instruction cache to store a plurality of instructions, a scheduler unit coupled to the instruction cache, the scheduler unit to schedule the plurality of instructions for execution, an instruction fetch and decode unit to decode the plurality of instructions to determine a set of operations to perform in response, one or more compute blocks to perform parallel multiply-accumulate operations based on the instruction fetch and decode unit decoding a first instruction of the plurality of instructions, and matrix multiplication logic to perform matrix multiplication operations based on the instruction fetch and decode unit decoding a second instruction of the plurality of instructions.
-
-
-
-
-
-
-
-
-