-
公开(公告)号:US20210097750A1
公开(公告)日:2021-04-01
申请号:US16585880
申请日:2019-09-27
Applicant: Intel Corporation
Inventor: Sven Woop , Prasoonkumar Surti , Karthik Vaidyanathan , Carsten Benthin , Joshua Barczak , Saikat Mandal
Abstract: An apparatus and method for merging primitives and coordinating between vertex and ray transformations on a shared transformation unit. For example, one embodiment of a graphics processor comprises: a queue comprising a plurality of entries; ordering circuitry/logic to order triangles front to back within the queue; pairing circuitry/logic to identify triangles in the queue sharing an edge and to merge the triangles sharing an edge to produce merged triangle pairs; and shared transformation circuitry to alternate between performing vertex transformations on vertices of the merged triangle pairs and to performing ray transformations on ray direction/origin data.
-
公开(公告)号:US10699370B1
公开(公告)日:2020-06-30
申请号:US16235604
申请日:2018-12-28
Applicant: Intel Corporation
Inventor: Karthik Vaidyanathan , Sven Woop , Carsten Benthin
Abstract: Apparatus and method for a compressed stack representation for a BVH. For example, one embodiment of an apparatus comprises: a ray generator to generate a plurality of rays in a first graphics scene; a bounding volume hierarchy (BVH) generator to construct a BVH comprising a plurality of hierarchically arranged nodes, wherein the BVH comprises a specified number of child nodes at a current BVH level beneath a parent node in the hierarchy; traversal/intersection circuitry to traverse one or more of the rays through the hierarchically arranged nodes of the BVH and intersect the one or more rays with primitives contained within the nodes; a short traversal stack of a fixed size comprising a specified number of entries fewer than the number of child nodes beneath the parent node, each entry associated with a child node at the current BVH level, the entries ordered from top to bottom within the short traversal stack based on a sorted distance of each respective child node, wherein each entry includes a field to indicate whether that entry is associated with a final child in the current BVH level; wherein the traversal/intersection circuitry is to process entries from the top of the traversal stack, removing entries as they are processed, the traversal/intersection circuitry to determine that a current entry is associated with the final child node at the current BVH level by reading a first value in the field.
-
13.
公开(公告)号:US12175589B2
公开(公告)日:2024-12-24
申请号:US18228777
申请日:2023-08-01
Applicant: INTEL CORPORATION
Inventor: Ingo Wald , Carsten Benthin , Sven Woop
Abstract: Apparatus and method for programmable ray tracing with hardware acceleration on a graphics processor. For example, one embodiment of a graphics processor comprises shader execution circuitry to execute a plurality of programmable ray tracing shaders. The shader execution circuitry includes a plurality of single instruction multiple data (SIMD) execution units. Sorting circuitry regroups data associated with one or more of the programmable ray tracing shaders to increase occupancy for SIMD operations performed by the SIMD execution units; and fixed-function intersection circuitry coupled to the shader execution circuitry detects intersections between rays and bounding volume hierarchies (BVHs) and/or objects contained therein and to provide results indicating the intersections to the sorting circuitry.
-
14.
公开(公告)号:US20240282045A1
公开(公告)日:2024-08-22
申请号:US18589239
申请日:2024-02-27
Applicant: Intel Corporation
Inventor: Sven Woop , Prasoonkumar Surti , Karthik Vaidyanathan , Carsten Benthin , Joshua Barczak , Saikat Mandal
CPC classification number: G06T15/06 , G06T15/005 , G06T2210/21
Abstract: An apparatus and method for merging primitives and coordinating between vertex and ray transformations on a shared transformation unit. For example, one embodiment of a graphics processor comprises: a queue comprising a plurality of entries; ordering circuitry/logic to order triangles front to back within the queue; pairing circuitry/logic to identify triangles in the queue sharing an edge and to merge the triangles sharing an edge to produce merged triangle pairs; and shared transformation circuitry to alternate between performing vertex transformations on vertices of the merged triangle pairs and to performing ray transformations on ray direction/origin data.
-
公开(公告)号:US11989818B2
公开(公告)日:2024-05-21
申请号:US17533531
申请日:2021-11-23
Applicant: INTEL CORPORATION
Inventor: Carsten Benthin , Sven Woop
Abstract: An apparatus and method for efficiently reconstructing a BVH. For example, one embodiment of a method comprises: constructing an object bounding volume hierarchy (BVH) for each object in a scene, each object BVH including a root node and one or more child nodes based on primitives included in each object; constructing a top-level BVH using the root nodes of the individual object BVHs; performing an analysis of the top-level BVH to determine whether the top-level BVH comprises a sufficiently efficient arrangement of nodes within its hierarchy; and reconstructing at least a portion of the top-level BVH if a more efficient arrangement of nodes exists, wherein reconstructing comprises rebuilding the portion of the top-level BVH until one or more stopping criteria have been met, the stopping criteria defined to prevent an entire rebuilding of the top-level BVH.
-
公开(公告)号:US11915369B2
公开(公告)日:2024-02-27
申请号:US16819120
申请日:2020-03-15
Applicant: Intel Corporation
Inventor: Karthik Vaidyanathan , Carsten Benthin , Sven Woop
CPC classification number: G06T17/10 , G06F7/24 , G06T1/20 , G06T15/005 , G06T15/06 , G06T15/08 , G06T17/205
Abstract: Apparatus and method for box-box testing. For example, one embodiment of a processor comprises: a bounding volume hierarchy (BVH) generator to construct a BVH comprising a plurality of hierarchically arranged BVH nodes; traversal circuitry to traverse query boxes through the BVH, the traversal circuitry to read a BVH node from a top of a BVH node stack and to read a query box from a local storage or memory, the traversal circuitry further comprising: box-box testing circuitry and/or logic to compare maximum and minimum X, Y, and Z coordinates of the BVH node and the query box and to generate an overlap indication if overlap is detected for each of the X, Y, and Z dimensions; distance determination circuitry and/or logic to generate a distance value representing an extent of overlap between the BVH node and the query box; and sorting circuitry and/or logic to sort the BVH node within a set of one or more additional BVH nodes based on the distance value.
-
公开(公告)号:US11887243B2
公开(公告)日:2024-01-30
申请号:US17533341
申请日:2021-11-23
Applicant: INTEL CORPORATION
Inventor: Karthik Vaidyanathan , Sven Woop , Carsten Benthin
CPC classification number: G06T15/06 , G06T1/60 , G06T9/40 , G06T17/005 , G06T2210/12 , G06T2210/21
Abstract: Apparatus and method for preventing re-traversal of a prior path on a restart. For example, one embodiment of an apparatus comprises: a ray generator to generate a plurality of rays in a graphics scene; a bounding volume hierarchy (BVH) generator to construct a BVH comprising a plurality of hierarchically arranged nodes, wherein the BVH comprises a specified number of child nodes at a current BVH level beneath a parent node in the hierarchy; circuitry to traverse one or more of the rays through the BVH to form a current traversal path and intersect the one or more rays with primitives contained within the nodes, wherein the circuitry is to process entries from the top of a first data structure comprising entries each associated with a child node at the current BVH level, the entries being ordered from top to bottom based on a sorted distance of each respective child node.
-
公开(公告)号:US20230090973A1
公开(公告)日:2023-03-23
申请号:US17480528
申请日:2021-09-21
Applicant: Intel Corporation
Inventor: Joydeep Ray , Abhishek R. Appu , Timothy R. Bauer , James Valerio , Weiyu Chen , Subramaniam Maiyuran , Prasoonkumar Surti , Karthik Vaidyanathan , Carsten Benthin , Sven Woop , Jiasheng Chen
Abstract: One embodiment provides a graphics processor including a processing resource including a register file, memory, a cache memory, and load/store/cache circuitry to process load, store, and prefetch messages from the processing resource. The circuitry includes support for an immediate address offset that will be used to adjust the address supplied for a memory access to be requested by the circuitry. Including support for the immediate address offset removes the need to execute additional instructions to adjust the address to be accessed prior to execution of the memory access instruction.
-
公开(公告)号:US20170178387A1
公开(公告)日:2017-06-22
申请号:US14975294
申请日:2015-12-18
Applicant: INTEL CORPORATION
Inventor: Sven Woop , Carsten Benthin , Rasmus Barringer , Tomas G. Akenine-Moller
CPC classification number: G06T15/06 , G06T15/005 , G06T15/08 , G06T2210/08 , G06T2210/12
Abstract: Embodiments provide for a graphics processing apparatus including a graphics processing unit having bounding volume logic to operate on a compressed bounding volume hierarchy, wherein each bounding volume node stores a parent bounding volume and multiple child bounding volumes that are encoded relative to the parent bounding volume.
-
公开(公告)号:US11989815B2
公开(公告)日:2024-05-21
申请号:US17677118
申请日:2022-02-22
Applicant: INTEL CORPORATION
Inventor: Prasoonkumar Surti , Carsten Benthin , Karthik Vaidyanathan , Philip Laws , Scott Janus , Sven Woop
CPC classification number: G06T15/005 , G06T1/20 , G06T15/06 , G06T2210/52
Abstract: Cluster of acceleration engines to accelerate intersections. For example, one embodiment of an apparatus comprises: a set of graphics cores to execute a first set of instructions of a primary graphics thread; a scalar cluster comprising a plurality of scalar execution engines; and a communication fabric interconnecting the set of graphics cores and the scalar cluster; the set of graphics cores to offload execution of a second set of instructions associated with ray traversal and/or intersection operations to the scalar cluster; the scalar cluster comprising a plurality of local memories, each local memory associated with one of the scalar execution engines, wherein each local memory is to store a portion of a hierarchical acceleration data structure required by an associated scalar execution engine to execute one or more of the second set of instructions; the plurality of scalar execution engines to store results of the execution of the second set of instructions in a memory accessible by the set of graphics cores; wherein the set of graphics cores are to process the results within the primary graphics thread.
-
-
-
-
-
-
-
-
-