-
公开(公告)号:US20220068005A1
公开(公告)日:2022-03-03
申请号:US17463320
申请日:2021-08-31
Applicant: Intel Corporation
Inventor: John G. GIERACH , Karthik VAIDYANATHAN , Thomas F. RAOUX
Abstract: Examples are described here that can be used to enable a main routine to request subroutines or other related code to be executed with other instantiations of the same subroutine or other related code for parallel execution. A sorting unit can be used to accumulate requests to execute instantiations of the subroutine. The sorting unit can request execution of a number of multiple instantiations of the subroutine corresponding to a number of lanes in a SIMD unit. A call stack can be used to share information to be accessed by a main routine after execution of the subroutine completes.
-
公开(公告)号:US20210287428A1
公开(公告)日:2021-09-16
申请号:US16819114
申请日:2020-03-15
Applicant: Intel Corporation
Inventor: Sven WOOP , Carsten BENTHIN , Karthik VAIDYANATHAN
Abstract: Apparatus and method for processing motion blur operations. For example, one embodiment of a graphics processing apparatus comprises: a bounding volume hierarchy (BVH) generator to build a BVH comprising hierarchically-arranged BVH nodes based on input primitives, at least one BVH node comprising one or more child nodes; and motion blur processing hardware logic to determine motion values for a quantization grid based on motion values of the one or more child nodes of the at least one BVH node and to map linear bounds of each of the child nodes to the quantization grid.
-
公开(公告)号:US20210049808A1
公开(公告)日:2021-02-18
申请号:US17003011
申请日:2020-08-26
Applicant: INTEL CORPORATION
Inventor: Scott JANUS , Prasoonkumar SURTI , Karthik VAIDYANATHAN , Alexey SUPIKOV , Gabor LIKTOR , Carsten BENTHIN , Philip LAWS , Michael DOYLE
Abstract: Apparatus and method for a hierarchical beam tracer. For example, one embodiment of an apparatus comprises: a beam generator to generate beam data associated with a beam projected into a graphics scene; a bounding volume hierarchy (BVH) generator to generate BVH data comprising a plurality of hierarchically arranged BVH nodes; a hierarchical beam-based traversal unit to determine whether the beam intersects a current BVH node and, if so, to responsively subdivide the beam into N child beams to test against the current BVH node and/or to traverse further down the BVH hierarchy to select a new BVH node, wherein the hierarchical beam-based traversal unit is to iteratively subdivide successive intersecting child beams and/or to continue to traverse down the BVH hierarchy until a leaf node is reached with which at least one final child beam is determined to intersect; the hierarchical beam-based traversal unit to generate a plurality of rays within the final child beam; and intersection hardware logic to perform intersection testing for any rays intersecting the leaf node, the intersection testing to determine intersections between the rays intersecting the leaf node and primitives bounded by the leaf node.
-
公开(公告)号:US20210042987A1
公开(公告)日:2021-02-11
申请号:US17079191
申请日:2020-10-23
Applicant: INTEL CORPORATION
Inventor: Karthik VAIDYANATHAN , Michael APODACA , Thomas RAOUX , Carsten BENTHIN , Kai XIAO , Carson BROWNLEE , Joshua BARCZAK
Abstract: An apparatus and method to execute ray tracing instructions. For example, one embodiment of an apparatus comprises execution circuitry to execute a dequantize instruction to convert a plurality of quantized data values to a plurality of dequantized data values, the dequantize instruction including a first source operand to identify a plurality of packed quantized data values in a source register and a destination operand to identify a destination register in which to store a plurality of packed dequantized data values, wherein the execution circuitry is to convert each packed quantized data value in the source register to a floating point value, to multiply the floating point value by a first value to generate a first product and to add the first product to a second value to generate a dequantized data value, and to store the dequantized data value in a packed data element location in the destination register.
-
公开(公告)号:US20250053452A1
公开(公告)日:2025-02-13
申请号:US18774583
申请日:2024-07-16
Applicant: Intel Corporation
Inventor: Pawel MAJEWSKI , Prasoonkumar SURTI , Karthik VAIDYANATHAN , Joshua BARCZAK , Vasanth RANGANATHAN , Vikranth VEMULAPALLI
Abstract: Apparatus and method for stack access throttling for synchronous ray tracing. For example, one embodiment of an apparatus comprises: ray tracing acceleration hardware to manage active ray tracing stack allocations to ensure that a size of the active ray tracing stack allocations remains within a threshold; and an execution unit to execute a thread to explicitly request a new ray tracing stack allocation from the ray tracing acceleration hardware, the ray tracing acceleration hardware to permit the new ray tracing stack allocation if the size of the active ray tracing stack allocations will remain within the threshold after permitting the new ray tracing stack allocation.
-
公开(公告)号:US20240104825A1
公开(公告)日:2024-03-28
申请号:US18376098
申请日:2023-10-03
Applicant: INTEL CORPORATION
Inventor: Karol SZERSZEN , Prasoonkumar SURTI , Gabor LIKTOR , Karthik VAIDYANATHAN , Sven WOOP
CPC classification number: G06T15/06 , G06T1/20 , G06T15/005 , G06T15/08 , G06T17/10
Abstract: Apparatus and method for grouping rays based on quantized ray directions. For example, one embodiment of an apparatus comprises: An apparatus comprising: a ray generator to generate a plurality of rays; ray direction evaluation circuitry/logic to generate approximate ray direction data for each of the plurality of rays; ray sorting circuitry/logic to sort the rays into a plurality of ray queues based, at least in part, on the approximate ray direction data.
-
公开(公告)号:US20240020911A1
公开(公告)日:2024-01-18
申请号:US17826090
申请日:2022-05-26
Applicant: Intel Corporation
Inventor: Michael NORRIS , Abhishek R. APPU , Prasoonkumar SURTI , Karthik VAIDYANATHAN
Abstract: Apparatus and method for routing data from ray tracing cache banks For example, one embodiment of an apparatus comprises: ray traversal hardware logic to perform traversal operations to traverse rays through a bounding volume hierarchy (BVH) comprising a plurality of BVH nodes, the ray traversal hardware logic comprising a plurality of traversal storage banks to store traversal data associated with the BVH nodes and/or the rays as the ray traversal hardware logic performs the traversal operations; and a cache comprising a plurality of cache banks to store the traversal data prior to being moved into the traversal storage banks for processing by the ray traversal hardware logic; and an inter-bank interconnect comprising: a point-to-point switch matrix to couple any of the cache banks to any of the traversal storage banks; an arbiter/allocator to control the point-to-point switch matrix to establish a particular group of interconnections between the cache banks and the traversal storage banks in a given clock cycle.
-
公开(公告)号:US20230297419A1
公开(公告)日:2023-09-21
申请号:US17699992
申请日:2022-03-21
Applicant: Intel Corporation
Inventor: Abhishek R. APPU , Joydeep RAY , Karthik VAIDYANATHAN , Sreedhar CHALASANI , Vasanth RANGANATHAN
IPC: G06F9/48 , G06F9/50 , G06F12/0891
CPC classification number: G06F9/4881 , G06F9/505 , G06F9/5016 , G06F12/0891
Abstract: Bank aware thread scheduling and early dependency clearing techniques are described herein. In one example, bank aware thread scheduling involves arbitrating and scheduling threads based on the cache bank that is to be accessed by the instructions to avoiding bank conflicts. Early dependency clearing involves clearing dependencies for cache loads in a scoreboard before the data is loaded. In early dependency clearing for loads, delays in operation can be reduced by clearing dependencies before data is required from the cache.
-
公开(公告)号:US20220254090A1
公开(公告)日:2022-08-11
申请号:US17677118
申请日:2022-02-22
Applicant: INTEL CORPORATION
Inventor: Prasoonkumar SURTI , Carsten BENTHIN , Karthik VAIDYANATHAN , Philip LAWS , Scott JANUS , Sven WOOP
Abstract: Cluster of acceleration engines to accelerate intersections. For example, one embodiment of an apparatus comprises: a set of graphics cores to execute a first set of instructions of a primary graphics thread; a scalar cluster comprising a plurality of scalar execution engines; and a communication fabric interconnecting the set of graphics cores and the scalar cluster; the set of graphics cores to offload execution of a second set of instructions associated with ray traversal and/or intersection operations to the scalar cluster; the scalar cluster comprising a plurality of local memories, each local memory associated with one of the scalar execution engines, wherein each local memory is to store a portion of a hierarchical acceleration data structure required by an associated scalar execution engine to execute one or more of the second set of instructions; the plurality of scalar execution engines to store results of the execution of the second set of instructions in a memory accessible by the set of graphics cores; wherein the set of graphics cores are to process the results within the primary graphics thread.
-
公开(公告)号:US20210142438A1
公开(公告)日:2021-05-13
申请号:US16683024
申请日:2019-11-13
Applicant: Intel Corporation
Inventor: Abhishek R. APPU , Eric G. LISKAY , Prasoonkumar SURTI , Sudhakar KAMMA , Karthik VAIDYANATHAN , Rajasekhar PANTANGI , Altug KOKER , Abhishek RHISHEEKESAN , Shashank LAKSHMINARAYANA , Priyanka LADDA , Karol A. Szerszen
Abstract: Examples described herein relate to a decompression engine that can request compressed data to be transferred over a memory bus. In some cases, the memory bus is a width that requires multiple data transfers to transfer the requested data. In a case that requested data is to be presented in-order to the decompression engine, a re-order buffer can be used to store entries of data. When a head-of-line entry is received, the entry can be provided to the decompression engine. When a last entry in a group of one or more entries is received, all entries in the group are presented in-order to the decompression engine. In some examples, a decompression engine can borrow memory resources allocated for use by another memory client to expand a size of re-order buffer available for use. For example, a memory client with excess capacity and a slowest growth rate can be chosen to borrow memory resources from.
-
-
-
-
-
-
-
-
-