-
公开(公告)号:US10242423B2
公开(公告)日:2019-03-26
申请号:US15789565
申请日:2017-10-20
Applicant: Intel Corporation
Inventor: Elmoustapha Ould-Ahmed-Vall , Sara S. Baghsorkhi , Anbang Yao , Kevin Nealis , Xiaoming Chen , Altug Koker , Abhishek R. Appu , John C. Weast , Mike B. Macpherson , Dukhwan Kim , Linda L. Hurd , Ben J. Ashbaugh , Barath Lakshmanan , Liwei Ma , Joydeep Ray , Ping T. Tang , Michael S. Strickland
Abstract: One embodiment provides an accelerator module comprising a memory stack including multiple memory dies; a graphics processing unit (GPU) coupled with the memory stack via one or more memory controllers, the GPU including a plurality of multiprocessors having a single instruction, multiple thread (SIMT) architecture, the multiprocessors to execute at least one single instruction; the at least one single instruction to cause at least a portion of the GPU to perform a floating-point operation on input having differing precisions; and the floating-point operation is a two-dimensional matrix multiply and accumulate operation.
-
公开(公告)号:US20180299841A1
公开(公告)日:2018-10-18
申请号:US15489142
申请日:2017-04-17
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Altug Koker , Linda L. Hurd , Dukhwan Kim , Mike B. MacPherson , John C. Weast , Justin E. Gottschlich , Jingyi Jin , Barath Lakshmanan , Chandrasekaran Sakthivel , Michael S. Strickland , Joydeep Ray , Kamal Sinha , Prasoonkumar Surti , Balaji Vembu , Ping T. Tang , Anbang Yao , Tatiana Shpeisman , Xiaoming Chen , Vasanth Ranganathan , Sanjeev S. Jahagirdar
Abstract: Methods and apparatus relating to autonomous vehicle neural network optimization techniques are described. In an embodiment, the difference between a first training dataset to be used for a neural network and a second training dataset to be used for the neural network is detected. The second training dataset is authenticated in response to the detection of the difference. The neural network is used to assist in an autonomous vehicle/driving. Other embodiments are also disclosed and claimed.
-
43.
公开(公告)号:US20250094170A1
公开(公告)日:2025-03-20
申请号:US18901027
申请日:2024-09-30
Applicant: Intel Corporation
Inventor: Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar
IPC: G06F9/30 , G06F1/16 , G06F7/483 , G06F7/544 , G06F9/38 , G06F17/16 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/08 , G06N20/00 , G06T15/00 , G09G5/393
Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute a 32-bit intermediate product of 16-bit operands and to compute a 32-bit sum based on the 32-bit intermediate product.
-
公开(公告)号:US12175252B2
公开(公告)日:2024-12-24
申请号:US17839856
申请日:2022-06-14
Applicant: Intel Corporation
Inventor: Elmoustapha Ould-Ahmed-Vall , Barath Lakshmanan , Tatiana Shpeisman , Joydeep Ray , Ping T. Tang , Michael Strickland , Xiaoming Chen , Anbang Yao , Ben J. Ashbaugh , Linda L. Hurd , Liwei Ma
IPC: G06F9/38 , G06F9/30 , G06F9/50 , G06F13/40 , G06F13/42 , G06F15/80 , G06N3/00 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/084 , G06N20/00 , G06N20/10 , G06T1/20
Abstract: One embodiment provides for a graphics processing unit (GPU) to accelerate machine learning operations, the GPU comprising an instruction cache to store a first instruction and a second instruction, the first instruction to cause the GPU to perform a floating-point operation, including a multi-dimensional floating-point operation, and the second instruction to cause the GPU to perform an integer operation; and a general-purpose graphics compute unit having a single instruction, multiple thread architecture, the general-purpose graphics compute unit to concurrently execute the first instruction and the second instruction.
-
45.
公开(公告)号:US12141578B2
公开(公告)日:2024-11-12
申请号:US17115989
申请日:2020-12-09
Applicant: Intel Corporation
Inventor: Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar
IPC: G06F9/30 , G06F7/483 , G06F7/544 , G06F9/38 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/08 , G09G5/393 , G06F17/16 , G06N20/00 , G06T15/00
Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute a 32-bit intermediate product of 16-bit operands and to compute a 32-bit sum based on the 32-bit intermediate product.
-
公开(公告)号:US12020135B2
公开(公告)日:2024-06-25
申请号:US17446101
申请日:2021-08-26
Applicant: Intel Corporation
Inventor: Liwei Ma , Elmoustapha Ould-Ahmed-Vall , Barath Lakshmanan , Ben J. Ashbaugh , Jingyi Jin , Jeremy Bottleson , Mike B. Macpherson , Kevin Nealis , Dhawal Srivastava , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Anbang Yao , Tatiana Shpeisman , Altug Koker , Abhishek R. Appu
Abstract: A library of machine learning primitives is provided to optimize a machine learning model to improve the efficiency of inference operations. In one embodiment a trained convolutional neural network (CNN) model is processed into a trained CNN model via pruning, convolution window optimization, and quantization.
-
公开(公告)号:US11934934B2
公开(公告)日:2024-03-19
申请号:US15488551
申请日:2017-04-17
Applicant: Intel Corporation
Inventor: Liwei Ma , Elmoustapha Ould- Ahmed-Vall , Barath Lakshmanan , Ben J. Ashbaugh , Jingyi Jin , Jeremy Bottleson , Mike B. Macpherson , Kevin Nealis , Dhawal Srivastava , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Anbang Yao , Tatiana Shpeisman , Altug Koker , Abhishek R. Appu
Abstract: An apparatus to facilitate optimization of a convolutional neural network (CNN) is disclosed. The apparatus includes optimization logic to receive a CNN model having a list of instructions and including pruning logic to optimize the list of instructions by eliminating branches in the list of instructions that comprise a weight value of 0.
-
公开(公告)号:US20230401668A1
公开(公告)日:2023-12-14
申请号:US18456235
申请日:2023-08-25
Applicant: Intel Corporation
Inventor: Elmoustapha Ould-Ahmed-Vall , Sara S. Baghsorkhi , Anbang Yao , Kevin Nealis , Xiaoming Chen , Altug Koker , Abhishek R. Appu , John C. Weast , Mike B. Macpherson , Dukhwan Kim , Linda L. Hurd , Ben J. Ashbaugh , Barath Lakshmanan , Liwei Ma , Joydeep Ray , Ping T. Tang , Michael S. Strickland
IPC: G06T1/20 , G06F7/483 , G06N3/084 , G06F9/30 , G06N3/063 , G06F9/50 , G06F9/38 , G06N3/044 , G06N3/045 , G06N20/00
CPC classification number: G06T1/20 , G06F7/483 , G06N3/084 , G06F9/30185 , G06F9/30014 , G06N3/063 , G06F9/5044 , G06F9/3863 , G06N3/044 , G06N3/045 , G06N20/00 , G06F3/14
Abstract: One embodiment provides a general-purpose graphics processing unit comprising a dynamic precision floating-point unit including a control unit having precision tracking hardware logic to track an available number of bits of precision for computed data relative to a target precision, wherein the dynamic precision floating-point unit includes computational logic to output data at multiple precisions.
-
公开(公告)号:US11748298B2
公开(公告)日:2023-09-05
申请号:US17826674
申请日:2022-05-27
Applicant: Intel Corporation
Inventor: Altug Koker , Farshad Akhbari , Feng Chen , Dukhwan Kim , Narayan Srinivasa , Nadathur Rajagopalan Satish , Liwei Ma , Jeremy Bottleson , Eriko Nurvitadhi , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu
IPC: G06F15/80 , G06F13/40 , G06T1/20 , G06F9/30 , G06F13/00 , G06N3/063 , G06N3/084 , G06N3/044 , G06N3/045 , G06N3/048
CPC classification number: G06F15/8007 , G06F9/3004 , G06F13/00 , G06F13/4027 , G06N3/044 , G06N3/045 , G06N3/048 , G06N3/063 , G06N3/084 , G06T1/20
Abstract: An integrated circuit (IC) package apparatus is disclosed. The IC package includes one or more processing units and a bridge, mounted below the one or more processing unit, including one or more arithmetic logic units (ALUs) to perform atomic operations.
-
公开(公告)号:US20220391679A1
公开(公告)日:2022-12-08
申请号:US17886055
申请日:2022-08-11
Applicant: Intel Corporation
Inventor: Rajkishore Barik , Elmoustapha Ould-Ahmed-Vall , Xiaoming Chen , Dhawal Srivastava , Anbang Yao , Kevin Nealis , Eriko Nurvitadhi , Sara S. Baghsorkhi , Balaji Vembu , Tatiana Shpeisman , Ping T. Tang
Abstract: One embodiment provides a graphics processor comprising an instruction cache to store an instruction and a compute block configured to perform multiply-accumulate operations in response to execution of the instruction. The compute block includes a scheduler to schedule a plurality of threads for execution of the instruction and multiply-accumulate circuitry configured to execute the instruction via the plurality of threads, wherein the multiply-accumulate circuitry includes a plurality of functional units configured to process, in parallel via the plurality of threads, a corresponding plurality of matrix elements to multiply a first matrix and a second matrix, and to multiply the first matrix and the second matrix includes to multiply data elements in a row of the first matrix by corresponding data elements in a column of the second matrix to generate a plurality of products.
-
-
-
-
-
-
-
-
-