-
公开(公告)号:US20240256456A1
公开(公告)日:2024-08-01
申请号:US18391346
申请日:2023-12-20
Applicant: Intel Corporation
Inventor: Vikranth Vemulapalli , Lakshminarayanan Striramassarma , Mike MacPherson , Aravindh Anantaraman , Ben Ashbaugh , Murali Ramadoss , William B. Sadler , Jonathan Pearce , Scott Janus , Brent Insko , Vasanth Ranganathan , Kamal Sinha , Arthur Hunter, Jr. , Prasoonkumar Surti , Nicolas Galoppo von Borries , Joydeep Ray , Abhishek R. Appu , ElMoustapha Ould-Ahmed-Vall , Altug Koker , Sungye Kim , Subramaniam Maiyuran , Valentin Andrei
IPC: G06F12/0862 , G06T1/20 , G06T1/60
CPC classification number: G06F12/0862 , G06T1/20 , G06T1/60 , G06F2212/602 , G06F2212/608
Abstract: Embodiments are generally directed to data prefetching for graphics data processing. An embodiment of an apparatus includes one or more processors including one or more graphics processing units (GPUs); and a plurality of caches to provide storage for the one or more GPUs, the plurality of caches including at least an L1 cache and an L3 cache, wherein the apparatus to provide intelligent prefetching of data by a prefetcher of a first GPU of the one or more GPUs including measuring a hit rate for the Li cache; upon determining that the hit rate for the L1 cache is equal to or greater than a threshold value, limiting a prefetch of data to storage in the L3 cache, and upon determining that the hit rate for the L1 cache is less than a threshold value, allowing the prefetch of data to the L1 cache.
-
公开(公告)号:US12014183B2
公开(公告)日:2024-06-18
申请号:US17949904
申请日:2022-09-21
Applicant: Intel Corporation
Inventor: John Wiegert , Joydeep Ray , Timothy Bauer , James Valerio
CPC classification number: G06F9/3887 , G06F9/355 , G06F15/7839 , G06F9/30036 , G06F9/30043
Abstract: Embodiments described herein provide a technique to decompose 64-bit per-lane virtual addresses to access a plurality of data elements on behalf of a multi-lane parallel processing execution resource of a graphics or compute accelerator. The 64-bit per-lane addresses are decomposed into a base address and a plurality of per-lane offsets for transmission to memory access circuitry. The memory access circuitry then combines the base address and the per-lane offsets to reconstruct the per-lane addresses.
-
13.
公开(公告)号:US20240184572A1
公开(公告)日:2024-06-06
申请号:US18528340
申请日:2023-12-04
Applicant: Intel Corporation
Inventor: Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar
IPC: G06F9/30 , G06F7/483 , G06F7/544 , G06F9/38 , G06F17/16 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/08 , G06N20/00 , G06T15/00 , G09G5/393
CPC classification number: G06F9/3001 , G06F7/483 , G06F7/5443 , G06F9/30014 , G06F9/30036 , G06F9/3851 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/08 , G09G5/393 , G06F9/30025 , G06F9/3013 , G06F17/16 , G06F2207/3824 , G06N20/00 , G06T15/005
Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute an intermediate product of 16-bit operands and to compute a 32-bit sum based on the intermediate product.
-
公开(公告)号:US20240134527A1
公开(公告)日:2024-04-25
申请号:US17971290
申请日:2022-10-20
Applicant: Intel Corporation
Inventor: Joydeep Ray , Michael Apodaca , Yoav Harel , Guei-Yuan Lueh , John A. Wiegert
CPC classification number: G06F3/061 , G06F3/0655 , G06F3/0679 , G06T1/60
Abstract: Embodiments described herein provide a technique to enable access to entries in a surface state or sampler state using 64-bit virtual addresses. One embodiment provides a graphics core that includes memory access circuitry configured to facilitate access to the memory by functional units of the graphics core. The memory access circuitry is configured to receive a message to access an entry in a surface state or a sampler state associated with a parallel processing operation. The message specifies a base address for a surface state entry or sampler state entry. The circuitry can add the base address and the offset to determine a 64-bit virtual address for the entry in the surface state entry or the sampler state and submit a memory access request to the memory to access the entry of the surface state or sampler state.
-
公开(公告)号:US11948224B2
公开(公告)日:2024-04-02
申请号:US17978573
申请日:2022-11-01
Applicant: Intel Corporation
Inventor: Elmoustapha Ould-Ahmed-Vall , Sara S. Baghsorkhi , Anbang Yao , Kevin Nealis , Xiaoming Chen , Altug Koker , Abhishek R. Appu , John C. Weast , Mike B. Macpherson , Dukhwan Kim , Linda L. Hurd , Ben J. Ashbaugh , Barath Lakshmanan , Liwei Ma , Joydeep Ray , Ping T. Tang , Michael S. Strickland
IPC: G06T1/20 , G06F3/14 , G06F7/483 , G06F9/30 , G06F9/38 , G06F9/50 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/08 , G06N3/084 , G06N20/00 , G06T1/60 , G06T15/00
CPC classification number: G06T1/20 , G06F7/483 , G06F9/30014 , G06F9/30185 , G06F9/3863 , G06F9/5044 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/084 , G06N20/00 , G06F3/14 , G06T1/60 , G06T15/005
Abstract: One embodiment provides an apparatus comprising a memory stack including multiple memory dies and a parallel processor including a plurality of multiprocessors. Each multiprocessor has a single instruction, multiple thread (SIMT) architecture, the parallel processor coupled to the memory stack via one or more memory interfaces. At least one multiprocessor comprises a multiply-accumulate circuit to perform multiply-accumulate operations on matrix data in a stage of a neural network implementation to produce a result matrix comprising a plurality of matrix data elements at a first precision, precision tracking logic to evaluate metrics associated with the matrix data elements and indicate if an optimization is to be performed for representing data at a second stage of the neural network implementation, and a numerical transform unit to dynamically perform a numerical transform operation on the matrix data elements based on the indication to produce transformed matrix data elements at a second precision.
-
公开(公告)号:US20240087077A1
公开(公告)日:2024-03-14
申请号:US17944542
申请日:2022-09-14
Applicant: Intel Corporation
Inventor: Joydeep Ray , Abhishek R. Appu , Prathamesh Raghunath Shinde , John Wiegert
Abstract: Embodiments described herein provide a technique to merge partial cache line writes to a cache memory. One embodiment provides a graphics processor comprising a graphics core, a cache coupled with the graphics core, and memory access circuitry to process memory access messages received from the graphics core. The memory access circuitry includes partial cache line write merge circuitry configured to merge a first partial write to a cache line of the cache with a second partial write to the cache line of the cache.
-
公开(公告)号:US20240086356A1
公开(公告)日:2024-03-14
申请号:US18491474
申请日:2023-10-20
Applicant: Intel Corporation
Inventor: Joydeep Ray , Altug Koker , Varghese George , Mike Macpherson , Aravindh Anantaraman , Abhishek R. Appu , Elmoustapha Ould-Ahmed-Vall , Nicolas Galoppo von Borries , Ben J. Ashbaugh
IPC: G06F15/78 , G06F7/544 , G06F7/575 , G06F7/58 , G06F9/30 , G06F9/38 , G06F9/50 , G06F12/02 , G06F12/06 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/80 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46
CPC classification number: G06F15/7839 , G06F7/5443 , G06F7/575 , G06F7/588 , G06F9/3001 , G06F9/30014 , G06F9/30036 , G06F9/3004 , G06F9/30043 , G06F9/30047 , G06F9/30065 , G06F9/30079 , G06F9/3887 , G06F9/5011 , G06F9/5077 , G06F12/0215 , G06F12/0238 , G06F12/0246 , G06F12/0607 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/8046 , G06F17/16 , G06F17/18 , G06T1/20 , G06T1/60 , H03M7/46 , G06T15/06
Abstract: Embodiments described herein provide techniques to facilitate instruction-based control of memory attributes. One embodiment provides a graphics processor comprising a processing resource, a memory device, a cache coupled with the processing resources and the memory, and circuitry to process a memory access message received from the processing resource. The memory access message enables access to data of the memory device. To process the memory access message, the circuitry is configured to determine one or more cache attributes that indicate whether the data should be read from or stored the cache. The cache attributes may be provided by the memory access message or stored in state data associated with the data to be accessed by the access message.
-
公开(公告)号:US20240086138A1
公开(公告)日:2024-03-14
申请号:US18474361
申请日:2023-09-26
Applicant: Intel Corporation
Inventor: Eric J. Asperheim , Subramaniam Maiyuran , Kiran C. Veernapu , Sanjeev S. Jahagirdar , Balaji Vembu , Devan Burke , Philip R. Laws , Kamal Sinha , Abhishek R. Appu , Elmoustapha Ould-Ahmed-Vall , Peter L. Doyle , Joydeep Ray , Travis T. Schluessler , John H. Feit , Nikos Kaburlasos , Jacek Kwiatkowski , Altug Koker
IPC: G06F3/14 , G06F3/01 , G06F3/0484 , G09G5/391
CPC classification number: G06F3/1438 , G06F3/013 , G06F3/0484 , G09G5/391 , G09G5/001 , G09G2340/0435 , G09G2352/00 , G09G2354/00 , G09G2360/08 , G09G2360/121
Abstract: In accordance with some embodiments, the render rate is varied across and/or up and down the display screen. This may be done based on where the user is looking in order to reduce power consumption and/or increase performance. Specifically the screen display is separated into regions, such as quadrants. Each of these regions is rendered at a rate determined by at least one of what the user is currently looking at, what the user has looked at in the past and/or what it is predicted that the user will look at next. Areas of less focus may be rendered at a lower rate, reducing power consumption in some embodiments.
-
公开(公告)号:US11915357B2
公开(公告)日:2024-02-27
申请号:US16820483
申请日:2020-03-16
Applicant: Intel Corporation
Inventor: Karthik Vaidyanathan , Abhishek Appu , Vasanth Ranganathan , Joydeep Ray , Prasoonkumar Surti
CPC classification number: G06T15/005 , G06T15/06
Abstract: Apparatus and method for stack throttling. For example, one embodiment of an apparatus comprises: execution circuitry comprising a plurality of functional units to execute a plurality of ray shaders and generate a plurality of primary rays and a corresponding plurality of ray messages; a first in first out (FIFO) buffer to queue the ray messages generated by the EUs; a cache to store one or more of the plurality of primary rays; a memory-backed stack to store a first subset of the plurality of ray messages in a corresponding plurality of entries; memory-backed stack management circuitry to either store a second subset of the plurality of ray messages to the memory-backed stack, or to temporarily store the one or more the second subset of the plurality of ray messages to a memory subsystem based, at least in part, on a number of entries currently occupied by ray messages in the memory-backed stack; and ray traversal circuitry to read a next ray message from the memory-backed stack, retrieve a next primary ray identified by the ray message from the cache or a memory subsystem, and perform traversal operations on the next primary ray.
-
公开(公告)号:US20240054595A1
公开(公告)日:2024-02-15
申请号:US17884755
申请日:2022-08-10
Applicant: Intel Corporation
Inventor: Joydeep Ray , Vasanth Ranganathan , James Valerio , Jeffery S. Boles , Hema Chand Nalluri , Aditya Navale , Ben J. Ashbaugh , Michal Mrozek , Murali Ramadoss , Hong Jiang , Ankur Shah
CPC classification number: G06T1/20 , G06T1/60 , G06F9/3855
Abstract: Embodiments described herein provide a system of concurrent compute queues that enable the scheduling of a large number of compute contexts simultaneously on graphics processor hardware. One embodiment provides an apparatus comprising a system interface and a general-purpose graphics processor coupled with the system interface. The general-purpose graphics processor comprises a plurality of graphics processor hardware resources configured to be partitioned into a plurality of isolated partitions, each of the plurality of isolated partitions including a first command streamer, a second command streamer, and circuitry configured to schedule general-purpose graphics compute workloads submitted to a first plurality of command queues associated with the first command streamer and a second plurality of command queues associated with the second command streamer.
-
-
-
-
-
-
-
-
-