-
公开(公告)号:US20240045830A1
公开(公告)日:2024-02-08
申请号:US18450685
申请日:2023-08-16
Applicant: Intel Corporation
Inventor: Joydeep RAY , Aravindh ANANTARAMAN , Abhishek R. APPU , Altug KOKER , Elmoustapha OULD-AHMED-VALL , Valentin ANDREI , Subramaniam MAIYURAN , Nicolas GALOPPO VON BORRIES , Varghese GEORGE , Mike MACPHERSON , Ben ASHBAUGH , Murali RAMADOSS , Vikranth VEMULAPALLI , William SADLER , Jonathan PEARCE , Sungye KIM
CPC classification number: G06F15/8069 , G06F9/30163 , G06F9/3877 , G06T15/005 , G06F9/3836
Abstract: Methods and apparatus relating to scalar core integration in a graphics processor. In an example, an apparatus comprises a processor to receive a set of workload instructions for a graphics workload from a host complex, determine a first subset of operations in the set of operations that is suitable for execution by a scalar processor complex of the graphics processing device and a second subset of operations in the set of operations that is suitable for execution by a vector processor complex of the graphics processing device, assign the first subset of operations to the scalar processor complex for execution to generate a first set of outputs, assign the second subset of operations to the vector processor complex for execution to generate a second set of outputs. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US20240020911A1
公开(公告)日:2024-01-18
申请号:US17826090
申请日:2022-05-26
Applicant: Intel Corporation
Inventor: Michael NORRIS , Abhishek R. APPU , Prasoonkumar SURTI , Karthik VAIDYANATHAN
Abstract: Apparatus and method for routing data from ray tracing cache banks For example, one embodiment of an apparatus comprises: ray traversal hardware logic to perform traversal operations to traverse rays through a bounding volume hierarchy (BVH) comprising a plurality of BVH nodes, the ray traversal hardware logic comprising a plurality of traversal storage banks to store traversal data associated with the BVH nodes and/or the rays as the ray traversal hardware logic performs the traversal operations; and a cache comprising a plurality of cache banks to store the traversal data prior to being moved into the traversal storage banks for processing by the ray traversal hardware logic; and an inter-bank interconnect comprising: a point-to-point switch matrix to couple any of the cache banks to any of the traversal storage banks; an arbiter/allocator to control the point-to-point switch matrix to establish a particular group of interconnections between the cache banks and the traversal storage banks in a given clock cycle.
-
公开(公告)号:US20230297419A1
公开(公告)日:2023-09-21
申请号:US17699992
申请日:2022-03-21
Applicant: Intel Corporation
Inventor: Abhishek R. APPU , Joydeep RAY , Karthik VAIDYANATHAN , Sreedhar CHALASANI , Vasanth RANGANATHAN
IPC: G06F9/48 , G06F9/50 , G06F12/0891
CPC classification number: G06F9/4881 , G06F9/505 , G06F9/5016 , G06F12/0891
Abstract: Bank aware thread scheduling and early dependency clearing techniques are described herein. In one example, bank aware thread scheduling involves arbitrating and scheduling threads based on the cache bank that is to be accessed by the instructions to avoiding bank conflicts. Early dependency clearing involves clearing dependencies for cache loads in a scoreboard before the data is loaded. In early dependency clearing for loads, delays in operation can be reduced by clearing dependencies before data is required from the cache.
-
公开(公告)号:US20230096188A1
公开(公告)日:2023-03-30
申请号:US17485262
申请日:2021-09-24
Applicant: Intel Corporation
Inventor: Karol A. SZERSZEN , Prasoonkumar SURTI , Abhishek R. APPU , John H. FEIT
Abstract: Examples include techniques for a fast clear of a 3-dimensional (3D) surface. Examples include re-describing 3D surface to a 2D surface using various dimension of the 3D surface as inputs in an algorithm to output a 2-dimensional (2D) surface as a re-description of the 3D surface. The algorithm to also includes additional inputs associated with a tiling mode used to read or write the 3D surface to a graphics display and a bit per pixel format to output the 2D surface. 2D surface width and height associated with the outputted 2D surface is included in a clear command to cause the 3D surface to be cleared.
-
公开(公告)号:US20220309731A1
公开(公告)日:2022-09-29
申请号:US17839303
申请日:2022-06-13
Applicant: Intel Corporation
Inventor: Joydeep RAY , Abhishek R. APPU , Pattabhiraman K , Balaji VEMBU , Altug KOKER , Niranjan L. COORAY , Josh B. MASTRONARDE
Abstract: An apparatus and method are described for allocating local memories to virtual machines. For example, one embodiment of an apparatus comprises: a command streamer to queue commands from a plurality of virtual machines (VMs) or applications, the commands to be distributed from the command streamer and executed by graphics processing resources of a graphics processing unit (GPU); a tile cache to store graphics data associated with the plurality of VMs or applications as the commands are executed by the graphics processing resources; and tile cache allocation hardware logic to allocate a first portion of the tile cache to a first VM or application and a second portion of the tile cache to a second VM or application; the tile cache allocation hardware logic to further allocate a first region in system memory to store spill-over data when the first portion of the tile cache and/or the second portion of the file cache becomes full.
-
公开(公告)号:US20220084329A1
公开(公告)日:2022-03-17
申请号:US17539083
申请日:2021-11-30
Applicant: Intel Corporation
Inventor: Barath LAKSHAMANAN , Linda L. HURD , Ben J. ASHBAUGH , Elmoustapha OULD-AHMED-VALL , Liwei MA , Jingyi JIN , Justin E. GOTTSCHLICH , Chandrasekaran SAKTHIVEL , Michael S. STRICKLAND , Brian T. LEWIS , Lindsey KUPER , Altug KOKER , Abhishek R. APPU , Prasoonkumar SURTI , Joydeep RAY , Balaji VEMBU , Javier S. TUREK , Naila FAROOQUI
IPC: G07C5/00 , G05D1/00 , G08G1/01 , H04W28/08 , H04L29/08 , G06N20/00 , G06F9/50 , G01C21/34 , B60W30/00 , G06N3/04 , G06N3/063 , G06N3/08 , G06N20/10
Abstract: An autonomous vehicle is provided that includes one or more processors configured to provide a local compute manager to manage execution of compute workloads associated with the autonomous vehicle. The local compute manager can perform various compute operations, including receiving offload of compute operations from to other compute nodes and offloading compute operations to other compute notes, where the other compute nodes can be other autonomous vehicles. The local compute manager can also facilitate autonomous navigation functionality.
-
公开(公告)号:US20210142438A1
公开(公告)日:2021-05-13
申请号:US16683024
申请日:2019-11-13
Applicant: Intel Corporation
Inventor: Abhishek R. APPU , Eric G. LISKAY , Prasoonkumar SURTI , Sudhakar KAMMA , Karthik VAIDYANATHAN , Rajasekhar PANTANGI , Altug KOKER , Abhishek RHISHEEKESAN , Shashank LAKSHMINARAYANA , Priyanka LADDA , Karol A. Szerszen
Abstract: Examples described herein relate to a decompression engine that can request compressed data to be transferred over a memory bus. In some cases, the memory bus is a width that requires multiple data transfers to transfer the requested data. In a case that requested data is to be presented in-order to the decompression engine, a re-order buffer can be used to store entries of data. When a head-of-line entry is received, the entry can be provided to the decompression engine. When a last entry in a group of one or more entries is received, all entries in the group are presented in-order to the decompression engine. In some examples, a decompression engine can borrow memory resources allocated for use by another memory client to expand a size of re-order buffer available for use. For example, a memory client with excess capacity and a slowest growth rate can be chosen to borrow memory resources from.
-
公开(公告)号:US20250068588A1
公开(公告)日:2025-02-27
申请号:US18822815
申请日:2024-09-03
Applicant: Intel Corporation
Inventor: Joydeep RAY , Aravindh ANANTARAMAN , Abhishek R. APPU , Altug KOKER , Elmoustapha OULD-AHMED-VALL , Valentin ANDREI , Subramaniam MAIYURAN , Nicolas GALOPPO VON BORRIES , Varghese GEORGE , Mike MACPHERSON , Ben ASHBAUGH , Murali RAMADOSS , Vikranth VEMULAPALLI , William SADLER , Jonathan PEARCE , Sungye KIM
Abstract: Methods and apparatus relating to scalar core integration in a graphics processor. In an example, an apparatus comprises a processor to receive a set of workload instructions for a graphics workload from a host complex, determine a first subset of operations in the set of operations that is suitable for execution by a scalar processor complex of the graphics processing device and a second subset of operations in the set of operations that is suitable for execution by a vector processor complex of the graphics processing device, assign the first subset of operations to the scalar processor complex for execution to generate a first set of outputs, assign the second subset of operations to the vector processor complex for execution to generate a second set of outputs. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US20240177264A1
公开(公告)日:2024-05-30
申请号:US18536581
申请日:2023-12-12
Applicant: Intel Corporation
Inventor: Joydeep RAY , Abhishek R. APPU , Altug KOKER , Balaji VEMBU
IPC: G06T1/20 , G06F12/0811 , G06F12/0815 , G06F12/0831 , G06F12/0875 , G06F12/0888 , G06T1/60
CPC classification number: G06T1/20 , G06F12/0811 , G06F12/0815 , G06F12/0831 , G06F12/0875 , G06F12/0888 , G06T1/60 , G06F2212/1024 , G06F2212/302 , G06F2212/455 , G06F2212/621
Abstract: An apparatus and method are described for managing data which is biased towards a processor or a GPU. For example, an apparatus comprises a processor comprising one or more cores, one or more cache levels, and cache coherence controllers to maintain coherent data in the one or more cache levels; a graphics processing unit (GPU) to execute graphics instructions and process graphics data, wherein the GPU and processor cores are to share a virtual address space for accessing a system memory; a GPU memory addressable through the virtual address space shared by the processor cores and GPU; and bias management circuitry to store an indication for whether the data has a processor bias or a GPU bias, wherein if the data has a GPU bias, the data is to be accessed by the GPU without necessarily accessing the processor's cache coherence controllers.
-
公开(公告)号:US20230297513A1
公开(公告)日:2023-09-21
申请号:US17699062
申请日:2022-03-18
Applicant: Intel Corporation
Inventor: Prasoonkumar SURTI , Tobias ZIRR , Abhishek R. APPU , Anton KAPLANYAN , Pawel MAJEWSKI , Joshua BARCZAK
IPC: G06F12/0897 , G06N20/00
CPC classification number: G06F12/0897 , G06N20/00 , G06F2212/60
Abstract: A cache streaming apparatus and method for machine learning. For example, one embodiment of an apparatus comprises: a plurality of compute units to perform machine learning operations; a cache subsystem comprising a hierarchy of cache levels, at least some of the cache levels shared by two or more of the plurality of compute units; and data streaming hardware logic to stream machine learning data in and out of the cache subsystem based on the machine learning operations, the data streaming hardware logic to load data into the cache subsystem from memory before the data is needed by a first portion of the machine learning operations and to ensure that results produced by the first portion of machine learning operations are maintained in the cache subsystem until used by a second portion of the machine learning operations.
-
-
-
-
-
-
-
-
-