-
公开(公告)号:US20220254090A1
公开(公告)日:2022-08-11
申请号:US17677118
申请日:2022-02-22
Applicant: INTEL CORPORATION
Inventor: Prasoonkumar SURTI , Carsten BENTHIN , Karthik VAIDYANATHAN , Philip LAWS , Scott JANUS , Sven WOOP
Abstract: Cluster of acceleration engines to accelerate intersections. For example, one embodiment of an apparatus comprises: a set of graphics cores to execute a first set of instructions of a primary graphics thread; a scalar cluster comprising a plurality of scalar execution engines; and a communication fabric interconnecting the set of graphics cores and the scalar cluster; the set of graphics cores to offload execution of a second set of instructions associated with ray traversal and/or intersection operations to the scalar cluster; the scalar cluster comprising a plurality of local memories, each local memory associated with one of the scalar execution engines, wherein each local memory is to store a portion of a hierarchical acceleration data structure required by an associated scalar execution engine to execute one or more of the second set of instructions; the plurality of scalar execution engines to store results of the execution of the second set of instructions in a memory accessible by the set of graphics cores; wherein the set of graphics cores are to process the results within the primary graphics thread.
-
公开(公告)号:US20220084329A1
公开(公告)日:2022-03-17
申请号:US17539083
申请日:2021-11-30
Applicant: Intel Corporation
Inventor: Barath LAKSHAMANAN , Linda L. HURD , Ben J. ASHBAUGH , Elmoustapha OULD-AHMED-VALL , Liwei MA , Jingyi JIN , Justin E. GOTTSCHLICH , Chandrasekaran SAKTHIVEL , Michael S. STRICKLAND , Brian T. LEWIS , Lindsey KUPER , Altug KOKER , Abhishek R. APPU , Prasoonkumar SURTI , Joydeep RAY , Balaji VEMBU , Javier S. TUREK , Naila FAROOQUI
IPC: G07C5/00 , G05D1/00 , G08G1/01 , H04W28/08 , H04L29/08 , G06N20/00 , G06F9/50 , G01C21/34 , B60W30/00 , G06N3/04 , G06N3/063 , G06N3/08 , G06N20/10
Abstract: An autonomous vehicle is provided that includes one or more processors configured to provide a local compute manager to manage execution of compute workloads associated with the autonomous vehicle. The local compute manager can perform various compute operations, including receiving offload of compute operations from to other compute nodes and offloading compute operations to other compute notes, where the other compute nodes can be other autonomous vehicles. The local compute manager can also facilitate autonomous navigation functionality.
-
公开(公告)号:US20210142438A1
公开(公告)日:2021-05-13
申请号:US16683024
申请日:2019-11-13
Applicant: Intel Corporation
Inventor: Abhishek R. APPU , Eric G. LISKAY , Prasoonkumar SURTI , Sudhakar KAMMA , Karthik VAIDYANATHAN , Rajasekhar PANTANGI , Altug KOKER , Abhishek RHISHEEKESAN , Shashank LAKSHMINARAYANA , Priyanka LADDA , Karol A. Szerszen
Abstract: Examples described herein relate to a decompression engine that can request compressed data to be transferred over a memory bus. In some cases, the memory bus is a width that requires multiple data transfers to transfer the requested data. In a case that requested data is to be presented in-order to the decompression engine, a re-order buffer can be used to store entries of data. When a head-of-line entry is received, the entry can be provided to the decompression engine. When a last entry in a group of one or more entries is received, all entries in the group are presented in-order to the decompression engine. In some examples, a decompression engine can borrow memory resources allocated for use by another memory client to expand a size of re-order buffer available for use. For example, a memory client with excess capacity and a slowest growth rate can be chosen to borrow memory resources from.
-
公开(公告)号:US20150348222A1
公开(公告)日:2015-12-03
申请号:US14292064
申请日:2014-05-30
Applicant: Intel Corporation
Inventor: Prasoonkumar SURTI , Aditya NAVALE
CPC classification number: G06T1/20
Abstract: An apparatus and method for identifying sub-groups of execution resources for parallel pixel processing. For example, one embodiment of a method comprises: determining X and Y coordinates for a pixel block to be processed; performing a first set of one or more modulus operations using even bits from the X and Y coordinates to generate a first intermediate result; performing a second set of one or more modulus operations using odd bits from the X and Y coordinates to generate a second intermediate result; comparing the first intermediate result and the second intermediate result to generate a final result; and using the final result to select a first set of processing resources from a set of N processing resources for processing the pixel block.
Abstract translation: 一种用于识别用于并行像素处理的执行资源的子组的装置和方法。 例如,方法的一个实施例包括:确定要处理的像素块的X和Y坐标; 使用来自X和Y坐标的偶数位执行第一组一个或多个模运算,以产生第一中间结果; 使用来自X和Y坐标的奇数位执行第二组一个或多个模运算,以产生第二中间结果; 比较第一中间结果和第二中间结果以产生最终结果; 以及使用最终结果从用于处理像素块的一组N个处理资源中选择第一组处理资源。
-
公开(公告)号:US20240265487A1
公开(公告)日:2024-08-08
申请号:US18433823
申请日:2024-02-06
Applicant: Intel Corporation
Inventor: Saikat MANDAL , Prasoonkumar SURTI , Sven WOOP
CPC classification number: G06T1/20 , G06F7/02 , G06F7/24 , G06F7/505 , G06F9/3885 , G06T15/005 , G06T15/08 , G06T17/10
Abstract: Apparatus and method for stable and short latency sorting. For example, one embodiment of a processor comprises: an input circuit to receive a set of N input values to be sorted into a sorted order; comparison circuitry to compare each input value with all other input values in parallel to generate at least N*(N−1)/2 comparison result values; matrix generation circuitry and/or logic to generate a result matrix having a row associated with each input value, a plurality of bits in each row comprising comparison result values indicating results of comparisons with other input values, wherein a first region of the result matrix is to store a first set of bits comprising the N*(N−1)/2 comparison result values and a second region of the result matrix, opposite the first region, is to store a second set of bits comprising an inverse of the N*(N−1)/2 comparison result values; a parallel adder circuit to perform parallel additions of the bits in each row to generate N unique result values; and sorting circuitry to index into the N unique result values to return the sorted order.
-
公开(公告)号:US20230205704A1
公开(公告)日:2023-06-29
申请号:US17561652
申请日:2021-12-23
Applicant: Intel Corporation
Inventor: Prasoonkumar SURTI , Vidhya KRISHNAN , Abhishek R. APPU , Karol A. SZERSZEN , Lakshminarayanan STRIRAMASSARMA
IPC: G06F12/0897
CPC classification number: G06F12/0897 , G06F2212/401
Abstract: A graphics processor includes multiple levels of memory units, including a memory device and a cache device located near a graphics component. The graphics processor includes distributed compression/decompression, including a module between the cache device and the memory device. The module can perform compression of write data when the write data is moved from the cache device to the memory device, and perform decompression of read data when the read data is moved from the memory device to the cache device. The graphics processor can include a second level of cache with another compression module between the first level of cache and the second level of cache.
-
37.
公开(公告)号:US20230060900A1
公开(公告)日:2023-03-02
申请号:US17959872
申请日:2022-10-04
Applicant: INTEL CORPORATION
Inventor: Christopher J. HUGHES , Jonathan D. PEARCE , Guei-Yuan LUEH , ElMoustapha OULD-AHMED-VALL , Jorge E. PARRA , Prasoonkumar SURTI , Krishna N. VINOD , Ronen ZOHAR
IPC: G06F9/30
Abstract: Embodiments detailed herein relate to reduction operations on a plurality of data element values. In one embodiment, a process comprises decoding circuitry to decode an instruction and execution circuitry to execute the decoded instruction. The instruction specifies a first input register containing a plurality of data element values, a first index register containing a plurality of indices, and an output register, where each index of the plurality of indices maps to one unique data element position of the first input register. The execution includes to identify data element values that are associated with one another based on the indices, perform one or more reduction operations on the associated data element values based on the identification, and store results of the one or more reduction operations in the output register.
-
公开(公告)号:US20220130106A1
公开(公告)日:2022-04-28
申请号:US17517113
申请日:2021-11-02
Applicant: INTEL CORPORATION
Inventor: Brent E. INSKO , Prasoonkumar SURTI
IPC: G06T15/40 , G06T15/00 , H04N13/279
Abstract: An apparatus and method are described for performing an early depth test on graphics data. For example, one embodiment of a graphics processing apparatus comprises: early depth test circuitry to perform an early depth test on blocks of pixels to determine whether all pixels in the block of pixels can be resolved by the early depth test; a plurality of execution circuits to execute pixel shading operations on the blocks of pixels; and a scheduler circuit to schedule the blocks of pixels for the pixel shading operations, the scheduler circuit to prioritize the blocks of pixels in accordance with the determination as to whether all pixels in the block of pixels can be resolved by the early depth test.
-
公开(公告)号:US20210407177A1
公开(公告)日:2021-12-30
申请号:US17368335
申请日:2021-07-06
Applicant: Intel Corporation
Inventor: Scott JANUS , Prasoonkumar SURTI , Karthik VAIDYANATHAN , Carsten BENTHIN , Philip LAWS
Abstract: Apparatus and method for ray tracing acceleration using a grid primitive. For example, one embodiment of an apparatus comprises: a grid primitive generator to generate a grid primitive comprising a plurality of adjacent interconnected primitives; a bitmask generator to generate a bitmask associated with the grid primitive, the bitmask comprising a plurality of bitmask values, each mask value associated with a primitive of the grid primitive; a ray tracing engine comprising traversal and intersection hardware logic to perform traversal and intersection operations in which rays are traversed through a hierarchical acceleration data structure and intersections between the rays and one or more of the adjacent interconnected primitives identified, wherein the ray tracing engine is to read the bitmask to determine a first set of primitives from the grid primitive on which to perform the traversal and intersection operations and a second set of primitives from the grid primitive on which the traversal and intersection operations will not be performed.
-
公开(公告)号:US20210295463A1
公开(公告)日:2021-09-23
申请号:US16823741
申请日:2020-03-19
Applicant: Intel Corporation
Inventor: Saikat MANDAL , Prasoonkumar SURTI , Sven WOOP
Abstract: Apparatus and method for stable and short latency sorting. For example, one embodiment of a processor comprises: an input circuit to receive a set of N input values to be sorted into a sorted order; comparison circuitry to compare each input value with all other input values in parallel to generate at least N*(N−1)/2 comparison result values; matrix generation circuitry and/or logic to generate a result matrix having a row associated with each input value, a plurality of bits in each row comprising comparison result values indicating results of comparisons with other input values, wherein a first region of the result matrix is to store a first set of bits comprising the N*(N−1)/2 comparison result values and a second region of the result matrix, opposite the first region, is to store a second set of bits comprising an inverse of the N*(N−1)/2 comparison result values; a parallel adder circuit to perform parallel additions of the bits in each row to generate N unique result values; and sorting circuitry to index into the N unique result values to return the sorted order.
-
-
-
-
-
-
-
-
-