-
公开(公告)号:US20240281249A1
公开(公告)日:2024-08-22
申请号:US18170808
申请日:2023-02-17
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Altug Koker , Joydeep Ray , Karthik Vaidyanathan , Sreedhar Chalasani , Eric Liskay , Prathamesh Raghunath Shinde , Vasanth Ranganathan , Michael J. Norris , Rajasekhar Pantangi
CPC classification number: G06F9/30043 , G06F9/30047 , G06F9/546
Abstract: One embodiment provides a graphics processor comprising memory access circuitry configured to receive a message from an instruction execution resource and determine a destination for the message, the destination one of shared function circuitry of a graphics core or a set of memory banks within the graphics core. The memory access circuitry then routes the message to the shared function circuitry in response to a determination that the message is directed to the shared function circuitry or routes the message to a message sequencer associated with the instruction execution resource in response to a determination that the message is directed to the set of memory banks.
-
公开(公告)号:US20240220254A1
公开(公告)日:2024-07-04
申请号:US18148997
申请日:2022-12-30
Applicant: Intel Corporation
Inventor: Chunhui Mei , Yongsheng Liu , John A. Wiegert , Vasanth Ranganathan , Ben J. Ashbaugh , Fangwen Fu , Hong Jiang , Guei-Yuan Lueh , James Valerio , Alan M. Curtis , Maxim Kazakov
CPC classification number: G06F9/30087 , G06F9/3877 , G06F9/5072 , G06F9/544
Abstract: Data multicast in compute core clusters is described. An example of an apparatus includes one or more processors including at least a first processor, the first processor including one or more clusters of cores and a memory, wherein each cluster of cores includes multiple cores, each core including one or more processing resources, shared memory, and broadcast circuitry; and wherein a first core in a first cluster of cores is to request a data element, determine whether any additional cores in the first cluster require the data element, and, upon determining that one or more additional cores in the first cluster require the data element, broadcast the data element to the one or more additional cores via interconnects between the broadcast circuitry of the cores of the first core cluster.
-
公开(公告)号:US20240161227A1
公开(公告)日:2024-05-16
申请号:US18532245
申请日:2023-12-07
Applicant: Intel Corporation
Inventor: Abhishek Appu , Subramaniam Maiyuran , Mike Macpherson , Fangwen Fu , Jiasheng Chen , Varghese George , Vasanth Ranganathan , Ashutosh Garg , Joydeep Ray
IPC: G06T1/20 , G06F7/544 , G06F9/50 , G06F12/0806 , G06F15/80 , G06F17/16 , G06N3/048 , G06N3/08 , G06N3/084
CPC classification number: G06T1/20 , G06F7/5443 , G06F9/5027 , G06F12/0806 , G06F15/8046 , G06F17/16 , G06N3/048 , G06N3/08 , G06N3/084
Abstract: Embodiments described herein include software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. One embodiment provides for data aware sparsity via compressed bitstreams. One embodiment provides for block sparse dot product instructions. One embodiment provides for a depth-wise adapter for a systolic array.
-
公开(公告)号:US11892950B2
公开(公告)日:2024-02-06
申请号:US17865666
申请日:2022-07-15
Applicant: Intel Corporation
Inventor: Vikranth Vemulapalli , Lakshminarayanan Striramassarma , Mike MacPherson , Aravindh Anantaraman , Ben Ashbaugh , Murali Ramadoss , William B. Sadler , Jonathan Pearce , Scott Janus , Brent Insko , Vasanth Ranganathan , Kamal Sinha , Arthur Hunter, Jr. , Prasoonkumar Surti , Nicolas Galoppo von Borries , Joydeep Ray , Abhishek R. Appu , ElMoustapha Ould-Ahmed-Vall , Altug Koker , Sungye Kim , Subramaniam Maiyuran , Valentin Andrei
IPC: G06F12/084 , G06F12/0862 , G06T1/20 , G06T1/60
CPC classification number: G06F12/0862 , G06T1/20 , G06T1/60 , G06F2212/602 , G06F2212/608
Abstract: Embodiments are generally directed to data prefetching for graphics data processing. An embodiment of an apparatus includes one or more processors including one or more graphics processing units (GPUs); and a plurality of caches to provide storage for the one or more GPUs, the plurality of caches including at least an L1 cache and an L3 cache, wherein the apparatus to provide intelligent prefetching of data by a prefetcher of a first GPU of the one or more GPUs including measuring a hit rate for the L1 cache; upon determining that the hit rate for the L1 cache is equal to or greater than a threshold value, limiting a prefetch of data to storage in the L3 cache, and upon determining that the hit rate for the L1 cache is less than a threshold value, allowing the prefetch of data to the L1 cache.
-
公开(公告)号:US11868264B2
公开(公告)日:2024-01-09
申请号:US18168157
申请日:2023-02-13
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Altug Koker , Joydeep Ray , David Puffer , Prasoonkumar Surti , Lakshminarayanan Striramassarma , Vasanth Ranganathan , Kiran C. Veernapu , Balaji Vembu , Pattabhiraman K
IPC: G06F12/0877 , G06F12/0802 , G06F12/0855 , G06F12/0806 , G06F12/0846 , G06F12/0868 , G06T1/60 , G06F12/126 , G06F12/0893
CPC classification number: G06F12/0877 , G06F12/0802 , G06F12/0806 , G06F12/0848 , G06F12/0855 , G06F12/0868 , G06F12/126 , G06T1/60 , G06F12/0893
Abstract: One embodiment provides circuitry coupled with cache memory and a memory interface, the circuitry to compress compute data at multiple cache line granularity, and a processing resource coupled with the memory interface and the cache memory. The processing resource is configured to perform a general-purpose compute operation on compute data associated with multiple cache lines of the cache memory. The circuitry is configured to compress the compute data before a write of the compute data via the memory interface to the memory bus, in association with a read of the compute data associated with the multiple cache lines via the memory interface, decompress the compute data, and provide the decompressed compute data to the processing resource.
-
公开(公告)号:US11861761B2
公开(公告)日:2024-01-02
申请号:US17095590
申请日:2020-11-11
Applicant: Intel Corporation
Inventor: Subramaniam Maiyuran , Durgaprasad Bilagi , Joydeep Ray , Scott Janus , Sanjeev Jahagirdar , Brent Insko , Lidong Xu , Abhishek R. Appu , James Holland , Vasanth Ranganathan , Nikos Kaburlasos , Altug Koker , Xinmin Tian , Guei-Yuan Lueh , Changliang Wang
IPC: G06T1/60 , G06T1/20 , G06F12/0802 , G06N5/04
CPC classification number: G06T1/60 , G06F12/0802 , G06N5/04 , G06T1/20 , G06F2212/251
Abstract: Embodiments described herein are generally directed to improvements relating to power, latency, bandwidth and/or performance issues relating to GPU processing/caching. According to one embodiment, a system includes a producer intellectual property (IP) (e.g., a media IP), a compute core (e.g., a GPU or an AI-specific core of the GPU), a streaming buffer logically interposed between the producer IP and the compute core. The producer IP is operable to consume data from memory and output results to the streaming buffer. The compute core is operable to perform AI inference processing based on data consumed from the streaming buffer and output AI inference processing results to the memory.
-
公开(公告)号:US20230306681A1
公开(公告)日:2023-09-28
申请号:US18189873
申请日:2023-03-24
Applicant: Intel Corporation
Inventor: Saikat Mandal , Vasanth Ranganathan
CPC classification number: G06T15/405 , G06T1/20 , G06T1/60
Abstract: Embodiments described herein provide for a technique to improve the culling efficiency of coarse depth testing. One embodiment provides for a graphics processor that is configured to perform a method to track a history of source fragments that are tested against a destination tile. When a combination of partial fragments sum to full coverage, the most conservative source far depth value is used instead of the previous destination far depth value. When the combination sums to partial coverage, the previous destination far depth value is retained.
-
公开(公告)号:US20230260075A1
公开(公告)日:2023-08-17
申请号:US18305904
申请日:2023-04-24
Applicant: Intel Corporation
Inventor: Subramaniam Maiyuran , Durgaprasad Bilagi , Joydeep Ray , Scott Janus , Sanjeev Jahagirdar , Brent Insko , Lidong Xu , Abhishek R. Appu , James Holland , Vasanth Ranganathan , Nikos Kaburlasos , Altug Koker , Xinmin Tian , Guei-Yuan Lueh , Changliang Wang
IPC: G06T1/60 , G06T1/20 , G06F12/0802 , G06N5/04
CPC classification number: G06T1/60 , G06T1/20 , G06F12/0802 , G06N5/04 , G06F2212/251
Abstract: Embodiments described herein are generally directed to improvements relating to power, latency, bandwidth and/or performance issues relating to GPU processing/caching. According to one embodiment, a state of multiple intellectual property (IP) cores that have access to a common cache via a central fabric is observed. Responsive to the observed state being indicative of performance of a standalone workload by a first IP core of the multiple IP cores, the common cache is treated as a local cache of the first IP core by powering off the central fabric and causing the first IP core to access the common cache via a low power access path between the first IP core and the common cache that is outside of the central fabric.
-
公开(公告)号:US20230102538A1
公开(公告)日:2023-03-30
申请号:US17484066
申请日:2021-09-24
Applicant: Intel Corporation
Inventor: Joydeep Ray , Prathamesh Raghunath Shinde , Ben J. Ashbaugh , Wei-Yu Chen , Abhishek R. Appu , Vasanth Ranganathan , Dmitry Yurievich Babokin , Ankur N. Shah
Abstract: Embodiments are directed to systems and methods for supporting generic pointers in hardware of a GPU. According to one embodiment, a GPU includes multiple sub-cores each having a processing resource and a load/store pipeline. The processing resource is operable to receive a memory access message including a pointer and a memory type identifier indicative of the pointer representing a generic pointer. The processing resource is further operable to output a load or store operation to the load/store pipeline based on the memory access message, including computing an address for the load or store operation by adding a base address of a named memory type of a plurality of named memory types referenced by the generic pointer to an offset into a memory of the named memory type. The load/store pipeline is operable to, responsive to receipt of the load or store operation, access the memory at the address.
-
公开(公告)号:US11615584B2
公开(公告)日:2023-03-28
申请号:US17382888
申请日:2021-07-22
Applicant: Intel Corporation
Inventor: Vasanth Ranganathan , Saikat Mandal , Saurabh Sharma , Vamsee Vardhan Chivukula , Karol A. Szerszen , Aleksander Olek Neyman , Altug Koker , Prasoonkumar Surti , Abhishek Appu , Joydeep Ray , Art Hunter , Luis F. Cruz Camacho , Akshay R. Chada
Abstract: Briefly, in accordance with one or more embodiments, a processor performs a coarse depth test on pixel data, and performs a final depth test on the pixel data. Coarse depth data is stored in a coarse depth cache, and per pixel depth data is stored in a per pixel depth cache. If a result of the coarse depth test is ambiguous, the processor is to read the per pixel depth data from the per pixel depth cache, and to update the coarse depth data with the per pixel depth data if the per pixel depth data has a smaller depth range than the coarse depth data.
-
-
-
-
-
-
-
-
-