-
公开(公告)号:US20230334316A1
公开(公告)日:2023-10-19
申请号:US18314450
申请日:2023-05-09
Applicant: Intel Corporation
Inventor: Altug Koker , Abhishek R. Appu , Kamal Sinha , Joydeep Ray , Balaji Vembu , Elmoustapha Ould-Ahmed-Vall , Sara S. Baghsorkhi , Anbang Yao , Kevin Nealis , Xiaoming Chen , John C. Weast , Justin E. Gottschlich , Prasoonkumar Surti , Chandrasekaran Sakthivel , Farshad Akhbari , Nadathur Rajagopalan Satish , Liwei Ma , Jeremy Bottleson , Eriko Nurvitadhi , Travis T. Schluessler , Ankur N. Shah , Jonathan Kennedy , Vasanth Ranganathan , Sanjeev Jahagirdar
Abstract: Described herein is a graphics processor comprising a memory device and a graphics processing cluster coupled with the memory device. The graphics processing cluster includes a plurality of graphics multiprocessors interconnected via a data interconnect. A graphics multiprocessor includes circuitry configured to load a modular neural network including a plurality of subnetworks, each of the plurality of subnetworks trained to perform a computer vision operation on a separate subject.
-
公开(公告)号:US11762804B2
公开(公告)日:2023-09-19
申请号:US17868448
申请日:2022-07-19
Applicant: Intel Corporation
Inventor: Joydeep Ray , Aravindh Anantaraman , Abhishek R. Appu , Altug Koker , Elmoustapha Ould-Ahmed-Vall , Valentin Andrei , Subramaniam Maiyuran , Nicolas Galoppo Von Borries , Varghese George , Mike MacPherson , Ben Ashbaugh , Murali Ramadoss , Vikranth Vemulapalli , William Sadler , Jonathan Pearce , Sungye Kim
CPC classification number: G06F15/8069 , G06F9/30163 , G06F9/3877 , G06T15/005 , G06F9/3836
Abstract: Methods and apparatus relating to scalar core integration in a graphics processor. In an example, an apparatus comprises a processor to receive a set of workload instructions for a graphics workload from a host complex, determine a first subset of operations in the set of operations that is suitable for execution by a scalar processor complex of the graphics processing device and a second subset of operations in the set of operations that is suitable for execution by a vector processor complex of the graphics processing device, assign the first subset of operations to the scalar processor complex for execution to generate a first set of outputs, assign the second subset of operations to the vector processor complex for execution to generate a second set of outputs. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US11748106B2
公开(公告)日:2023-09-05
申请号:US17683564
申请日:2022-03-01
Applicant: Intel Corporation
Inventor: Liwei Ma , Nadathur Rajagopalan Satish , Jeremy Bottleson , Farshad Akhbari , Eriko Nurvitadhi , Abhishek R. Appu , Altug Koker , Kamal Sinha , Joydeep Ray , Balaji Vembu , Vasanth Ranganathan , Sanjeev Jahagirdar
IPC: G06F9/38
CPC classification number: G06F9/3832
Abstract: A mechanism is described for facilitating fast data operations and for facilitating a finite state machine for machine learning at autonomous machines. A method of embodiments, as described herein, includes detecting input data to be used in computational tasks by a computation component of a processor including a graphics processor. The method may further include determining one or more frequently-used data values (FDVs) from the data, and pushing the one or more frequent data values to bypass the computational tasks.
-
公开(公告)号:US20230260072A1
公开(公告)日:2023-08-17
申请号:US18168207
申请日:2023-02-13
Applicant: Intel Corporation
Inventor: Prasoonkumar Surti , Narayan Srinivasa , Feng Chen , Joydeep Ray , Ben J. Ashbaugh , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Balaji Vembu , Tsung-Han Lin , Kamal Sinha , Rajkishore Barik , Sara S. Baghsorkhi , Justin E. Gottschlich , Altug Koker , Nadathur Rajagopalan Satish , Farshad Akhbari , Dukhwan Kim , Wenyin Fu , Travis T. Schluessler , Josh B. Mastronarde , Linda L. Hurd , John H. Feit , Jeffery S. Boles , Adam T. Lake , Karthik Vaidyanathan , Devan Burke , Subramaniam Maiyuran , Abhishek R. Appu
CPC classification number: G06T1/20 , G06F9/45533 , G06F9/5061 , G06F9/5094 , G06N3/063 , G06N3/084 , G06N3/044 , G06N3/045 , G06F8/41
Abstract: Embodiments provide mechanisms to facilitate compute operations for deep neural networks. One embodiment comprises a graphics processing unit comprising one or more multiprocessors, at least one of the one or more multiprocessors including a register file to store a plurality of different types of operands and a plurality of processing cores. The plurality of processing cores includes a first set of processing cores of a first type and a second set of processing cores of a second type. The first set of processing cores are associated with a first memory channel and the second set of processing cores are associated with a second memory channel.
-
公开(公告)号:US11726793B2
公开(公告)日:2023-08-15
申请号:US17095585
申请日:2020-11-11
Applicant: Intel Corporation
Inventor: Christopher J. Hughes , Prasoonkumar Surti , Guei-Yuan Lueh , Adam T. Lake , Jill Boyce , Subramaniam Maiyuran , Lidong Xu , James M. Holland , Vasanth Ranganathan , Nikos Kaburlasos , Altug Koker , Abhishek R. Appu
IPC: G06F9/38 , G06F12/084 , G06T1/60 , G06F9/50 , G06F9/54
CPC classification number: G06F9/3891 , G06F9/5066 , G06F9/544 , G06F12/084 , G06T1/60
Abstract: Embodiments described herein provide an apparatus comprising a plurality of processing resources including a first processing resource and a second processing resource, a memory communicatively coupled to the first processing resource and the second processing resource, and a processor to receive data dependencies for one or more tasks comprising one or more producer tasks executing on the first processing resource and one or more consumer tasks executing on the second processing resource and move a data output from one or more producer tasks executing on the first processing resource to a cache memory communicatively coupled to the second processing resource. Other embodiments may be described and claimed.
-
196.
公开(公告)号:US11720355B2
公开(公告)日:2023-08-08
申请号:US17834482
申请日:2022-06-07
Applicant: Intel Corporation
Inventor: Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar
IPC: G06F9/30 , G09G5/393 , G06F9/38 , G06F7/483 , G06F7/544 , G06N3/063 , G06N3/08 , G06N3/044 , G06N3/045 , G06T15/00 , G06N20/00 , G06F17/16
CPC classification number: G06F9/3001 , G06F7/483 , G06F7/5443 , G06F9/30014 , G06F9/30036 , G06F9/3851 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/08 , G09G5/393 , G06F9/3013 , G06F9/30025 , G06F17/16 , G06F2207/3824 , G06N20/00 , G06T15/005
Abstract: One embodiment provides a graphics processor comprising a memory controller and a graphics processing resource coupled with the memory controller. The graphics processing resource includes circuitry configured to execute an instruction to perform a matrix operation on first input including weight data and second input including input activation data, generate intermediate data based on a result of the matrix operation, quantize the intermediate data to a floating-point format determined based on a statistical distribution of first output data, and output, as second output data, quantized intermediate data in a determined floating-point format.
-
公开(公告)号:US20230205587A1
公开(公告)日:2023-06-29
申请号:US17560744
申请日:2021-12-23
Applicant: Intel Corporation
Inventor: Zamshed Iqbal Chowdhury , Joydeep Ray , Chunhui Mei , Yongsheng Liu , Vasanth Ranganathan , Abhishek R. Appu , Aravindh Anantaraman
CPC classification number: G06F9/5027 , G06F9/541 , G06F2209/505
Abstract: Thread group dispatch in a clustered graphics architecture is described. An example of an apparatus includes of compute front end (CFE) clusters to receive dispatched thread groups, the CFE clusters including at least a first CFE cluster and a second CFE cluster; processing resources coupled with the CFE clusters to execute threads within thread groups; and cache clusters to cache data including thread groups, wherein the apparatus is to receive thread groups for dispatch, and to dispatch the thread groups to the CFE clusters according to a dispatch operation, the dispatch operation including dispatching multiple thread groups to each of multiple CFEs in the first CFE cluster and multiple thread groups to each of multiple CFEs in the second CFE cluster.
-
公开(公告)号:US11663746B2
公开(公告)日:2023-05-30
申请号:US17095544
申请日:2020-11-11
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Prasoonkumar Surti , Jill Boyce , Subramaniam Maiyuran , Michael Apodaca , Adam T. Lake , James Holland , Vasanth Ranganathan , Altug Koker , Lidong Xu , Nikos Kaburlasos
CPC classification number: G06T9/002 , G06N3/045 , G06T9/007 , G06T9/008 , G06T15/005
Abstract: Embodiments described herein provided for an instruction and associated logic to enable a processing resource including a tensor accelerator to perform optimized computation of sparse submatrix operations. One embodiment provides hardware logic to apply a numerical transform to matrix data to increase the sparsity of the data. Increasing the sparsity may result in a higher compression ratio when the matrix data is compressed.
-
公开(公告)号:US20230142472A1
公开(公告)日:2023-05-11
申请号:US17959374
申请日:2022-10-04
Applicant: Intel Corporation
Inventor: Eric J. Asperheim , Subramaniam Maiyuran , Kiran C. Veernapu , Sanjeev S. Jahagirdar , Balaji Vembu , Devan Burke , Philip R. Laws , Kamal Sinha , Abhishek R. Appu , Elmoustapha Ould-Ahmed-Vall , Peter L. Doyle , Joydeep Ray , Travis T. Schluessler , John H. Feit , Nikos Kaburlasos , Jacek Kwiatkowski , Altug Koker
IPC: G06F3/14 , G06F3/01 , G09G5/391 , G06F3/0484
CPC classification number: G06F3/1438 , G06F3/013 , G09G5/391 , G06F3/0484 , G09G2354/00 , G09G2352/00 , G09G2360/08 , G09G2340/0435 , G09G2360/121 , G09G5/001
Abstract: In accordance with some embodiments, the render rate is varied across and/or up and down the display screen. This may be done based on where the user is looking in order to reduce power consumption and/or increase performance. Specifically the screen display is separated into regions, such as quadrants. Each of these regions is rendered at a rate determined by at least one of what the user is currently looking at, what the user has looked at in the past and/or what it is predicted that the user will look at next. Areas of less focus may be rendered at a lower rate, reducing power consumption in some embodiments.
-
公开(公告)号:US20230123644A1
公开(公告)日:2023-04-20
申请号:US17960928
申请日:2022-10-06
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Prasoonkumar Surti , Joydeep Ray , Michael J. Norris
Abstract: One embodiment provides a graphics processor comprising an interface to a system interconnect and a graphics processor coupled to the interface, the graphics processor comprising circuitry configured to compact sample data for multiple sample locations of a pixel, map the multiple sample locations to memory locations that store compacted sample data, the memory locations in a memory of the graphics processor, apply lossless compression to the compacted sample data, and update a compression control surface associated with the memory locations, the compression control surface to specify a compression status for the memory locations
-
-
-
-
-
-
-
-
-