-
公开(公告)号:US10891773B2
公开(公告)日:2021-01-12
申请号:US15482677
申请日:2017-04-07
Applicant: Intel Corporation
Inventor: Joydeep Ray , Abhishek R. Appu , Pattabhiraman K , Balaji Vembu , Altug Koker , Niranjan L. Cooray , Josh B. Mastronarde
IPC: G06T15/00 , G06F9/455 , G06T1/60 , G09G5/36 , G09G5/00 , G09G5/393 , G06F9/48 , G06F9/50 , G06T15/04 , G06T15/80 , G06T17/10 , G06T17/20
Abstract: An apparatus and method are described for allocating local memories to virtual machines. For example, one embodiment of an apparatus comprises: a command streamer to queue commands from a plurality of virtual machines (VMs) or applications, the commands to be distributed from the command streamer and executed by graphics processing resources of a graphics processing unit (GPU); a tile cache to store graphics data associated with the plurality of VMs or applications as the commands are executed by the graphics processing resources; and tile cache allocation hardware logic to allocate a first portion of the tile cache to a first VM or application and a second portion of the tile cache to a second VM or application; the tile cache allocation hardware logic to further allocate a first region in system memory to store spill-over data when the first portion of the tile cache and/or the second portion of the file cache becomes full.
-
公开(公告)号:US10860468B2
公开(公告)日:2020-12-08
申请号:US16379137
申请日:2019-04-09
Applicant: Intel Corporation
Inventor: Altug Koker , Joydeep Ray , Niranjan L. Cooray , Abhishek R. Appu
IPC: G06F12/00 , G06F12/0875 , G06F9/54 , G06T1/60 , G06F12/0811
Abstract: An apparatus to facilitate guaranteed forward progress for graphics data is disclosed. The apparatus includes a plurality of ports to receive and transmit streams of graphics data, one or more buffers associated with each of the plurality of ports to store the graphics data and switching logic to virtually partition each of the one or more buffers to allocate a dedicated buffer to receive each of a plurality of independent streams of graphics data.
-
公开(公告)号:US10853906B2
公开(公告)日:2020-12-01
申请号:US16197821
申请日:2018-11-21
Applicant: Intel Corporation
Inventor: Elmoustapha Ould-Ahmed-Vall , Sara S. Baghsorkhi , Anbang Yao , Kevin Nealis , Xiaoming Chen , Altug Koker , Abhishek R. Appu , John C. Weast , Mike B. Macpherson , Dukhwan Kim , Linda L. Hurd , Ben J. Ashbaugh , Barath Lakshmanan , Liwei Ma , Joydeep Ray , Ping T. Tang , Michael S. Strickland
IPC: G06T1/20 , G06F7/483 , G06N3/08 , G06F9/30 , G06N3/04 , G06N3/063 , G06F9/50 , G06F9/38 , G06N20/00 , G06F3/14 , G06T1/60 , G06T15/00
Abstract: One embodiment provides an accelerator module comprising a memory stack including multiple memory dies; a graphics processing unit (GPU) coupled with the memory stack via one or more memory controllers, the GPU including a plurality of multiprocessors having a single instruction, multiple thread (SIMT) architecture, the multiprocessors to execute at least one single instruction. The at least one single instruction is to cause at least a portion of the GPU to perform a floating point operation on input having differing precisions. The floating point operation is a two-dimensional matrix multiply and accumulate operation.
-
公开(公告)号:US10817042B2
公开(公告)日:2020-10-27
申请号:US16144538
申请日:2018-09-27
Applicant: Intel Corporation
Inventor: Kinchit Desai , Sanjeev Jahagirdar , Prasoonkumar Surti , Joydeep Ray
IPC: G06F1/3237 , G06N3/04 , G06N3/08 , G06F1/3234 , G06F1/3206
Abstract: Embodiments are generally directed to providing power savings for a neural network architecture with zero activations during inference. An embodiment of an apparatus includes one or more processors including one or more processor cores; and a memory to store data for processing including neural network processing, wherein the apparatus to perform a fast clear operation to initialize activation buffers for a neural network by updating metadata to indicate zero values, the neural network including a plurality of layers, wherein the apparatus is to compare outputs for the neural network to the metadata values and to write an output to memory only if the output is non-zero.
-
公开(公告)号:US20200334200A1
公开(公告)日:2020-10-22
申请号:US16869223
申请日:2020-05-07
Applicant: Intel Corporation
Inventor: Altug Koker , Prasoonkumar Surti , David Puffer , Subramaniam Maiyuran , Guei-Yuan Lueh , Abhishek R. Appu , Joydeep Ray , Balaji Vembu , Tomer Bar-On , Andrew T. Lauritzen , Hugues Labbe , John G. Gierach , Gabor Liktor
Abstract: In an example, an apparatus comprises a plurality of execution units, and a first memory communicatively couple to the plurality of execution units, wherein the first shared memory is shared by the plurality of execution units and a copy engine to copy context state data from at least a first of the plurality of execution units to the first shared memory. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US20200327635A1
公开(公告)日:2020-10-15
申请号:US16714862
申请日:2019-12-16
Applicant: Intel Corporation
Inventor: Altug Koker , Balaji Vembu , Joydeep Ray , James A. Valerio , Abhishek R. Appu
Abstract: One embodiment provides for a general-purpose graphics processing unit comprising a processing array including multiple compute blocks, each compute block including multiple processing clusters and a thread dispatch unit to dispatch threads of a workload to the multiple compute blocks based on a parallelism metric, wherein the thread dispatch unit, based on the parallelism metric, is to perform one of a first operation and a second operation, the first operation to distribute threads across the multiple compute blocks and the second operation is to concentrate threads within one of the multiple compute blocks.
-
公开(公告)号:US10802970B1
公开(公告)日:2020-10-13
申请号:US16366266
申请日:2019-03-27
Applicant: Intel Corporation
Inventor: Niranjan L. Cooray , Altug Koker , Vidhya Krishnan , Ronald W. Silvas , John H. Feit , Prasoonkumar Surti , Joydeep Ray , Abhishek R. Appu
IPC: G06F12/0837 , G06F9/38 , G06F16/907 , H04L9/06 , G06F12/0811
Abstract: Embodiments described herein provide an apparatus comprising a processor to allocate a first memory space for data for a graphics workload, the first memory comprising a first plurality of addressable memory locations, allocate a second memory space for compression metadata relating to the data for the graphics workload, the second memory space comprising a second plurality of addressable memory locations and having an amount of memory corresponding to a predetermined ratio of the amount of memory allocated to first memory space, and configure a direct memory mapping between the first plurality of addressable memory locations and the second plurality of addressable memory locations. Other embodiments may be described and claimed.
-
公开(公告)号:US10783084B2
公开(公告)日:2020-09-22
申请号:US16702073
申请日:2019-12-03
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Atlug Koker , Joydeep Ray , David Puffer , Prasoonkumar Surti , Lakshminarayanan Striramassarma , Vasanth Ranganathan , Kiran C. Veernapu , Balaji Vembu , Pattabhiraman K
IPC: G06F12/0877 , G06F12/0868 , G06F12/0846 , G06F12/0855 , G06F12/0802 , G06F12/0806 , G06F12/0893 , G06F12/126 , G06T1/60
Abstract: In an example, an apparatus comprises a plurality of execution units, and a cache memory communicatively coupled to the plurality of execution units, wherein the cache memory is structured into a plurality of sectors, wherein each sector in the plurality of sectors comprises at least two cache lines. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US20200294182A1
公开(公告)日:2020-09-17
申请号:US16355573
申请日:2019-03-15
Applicant: Intel Corporation
Inventor: Varghese George , Altug Koker , Aravindh Anantaraman , Subramaniam Maiyuran , SungYe Kim , Valentin Andrei , Elmoustapha Ould-Ahmed-Vall , Joydeep Ray , Abhishek R. Appu , Nicolas C. Galoppo von Borries , Prasoonkumar Surti , Mike Macpherson
Abstract: Apparatuses including general-purpose graphics processing units having on chip dense memory for temporal buffering are disclosed. In one embodiment, a graphics multiprocessor includes a plurality of compute engines to perform first computations to generate a first set of data, cache for storing data, and a high density memory that is integrated on chip with the plurality of compute engines and the cache. The high density memory to receive the first set of data, to temporarily store the first set of data, and to provide the first set of data to the cache during a first time period that is prior to a second time period when the plurality of compute engines will use the first set of data for second computations.
-
公开(公告)号:US20200286201A1
公开(公告)日:2020-09-10
申请号:US16297129
申请日:2019-03-08
Applicant: Intel Corporation
Inventor: James Valerio , Vasanth Ranganathan , Joydeep Ray , Abhishek R. Appu , Ben J. Ashbaugh , Brandon Fliflet , Jeffery S. Boles , Srinivasan Embar Raghukrishnan , Rahul Kulkarni
Abstract: Embodiments described herein provide an apparatus comprising a processor to configure a plurality of contexts of a command engine to execute a graphics workload comprising a plurality of walkers, allocate, from a pool of execution units of a graphics processor, a subset of execution units to each walker in the plurality of walkers based at least in part on the predetermined number of walkers configured for the context, for each context in the plurality of contexts, dispatch one or more walkers of the plurality of walkers to the execution units, and upon dispatch of the one or more walkers of the plurality of walkers, write an opcode to a computer-readable memory indicating that the dispatch of the walker is complete, wherein the opcode comprises dependency data for the one or more walkers of the plurality of walkers. Other embodiments may be described and claimed.
-
-
-
-
-
-
-
-
-