-
公开(公告)号:US20230195626A1
公开(公告)日:2023-06-22
申请号:US17558008
申请日:2021-12-21
Applicant: ADVANCED MICRO DEVICES, INC. , ATI TECHNOLOGIES ULC
Inventor: Saurabh Sharma , Jeremy Lukacs , Hashem Hashemi , Gianpaolo Tommasi , Guennadi Riguer , Mark Fowler , Randy Ramsey
IPC: G06F12/0806 , G06F12/10
CPC classification number: G06F12/0806 , G06F12/10 , G06F2212/1016
Abstract: A processing system is configured to translate a first cache access pattern of a dispatch of work items to a cache access pattern that facilitates consumption of data stored at a cache of a parallel processing unit by a subsequent access before the data is evicted to a more remote level of the memory hierarchy. For consecutive cache accesses having read-after-read data locality, in some embodiments the processing system translates the first cache access pattern to a space-filling curve. In some embodiments, for consecutive accesses having read-after-write data locality, the processing system translates a first typewriter cache access pattern that proceeds in ascending order for a first access to a reverse typewriter cache access pattern that proceeds in descending order for a subsequent cache access. By translating the cache access pattern based on data locality, the processing system increases the hit rate of the cache.
-
公开(公告)号:US11954757B2
公开(公告)日:2024-04-09
申请号:US17564153
申请日:2021-12-28
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Yash Ukidave , Randy Ramsey , Sukanya Chavan , Zhongliang Chen
Abstract: An apparatus, such as a graphical processing unit (GPU), includes one or more processors configured to determine a plurality of first locality information of a received wave at a processing unit and to select a first processing element of a plurality of processing elements, the first processing unit having a plurality of second locality information from a previous wave that matches the plurality of first locality information to execute the received wave.
-
公开(公告)号:US11941723B2
公开(公告)日:2024-03-26
申请号:US17564291
申请日:2021-12-29
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Randy Ramsey , Yash Ukidave
Abstract: Systems, methods, and techniques dynamically utilize load balancing for workgroup assignments between a group of shader engines by a command processor of a graphics processing unit (GPU). Based on one or more commands received for execution, a plurality of workgroups is generated for assignment to a plurality of shader engines for processing, each shader engine including a respective quantity of active compute units. Each workgroup of the plurality of workgroups is dynamically assigned to a respective shader engine for execution based at least in part on indications of available resources respectively associated with each of the shader engines. In various embodiments, the indications of available resources may include physical parameters regarding each shader engine, as well as current status information regarding the processing of workgroups assigned to each shader engine.
-
公开(公告)号:US20230195509A1
公开(公告)日:2023-06-22
申请号:US17557927
申请日:2021-12-21
Applicant: ADVANCED MICRO DEVICES, INC. , ATI TECHNOLOGIES ULC
Inventor: Saurabh Sharma , Jeremy Lukacs , Hashem Hashemi , Gianpaolo Tommasi , Guennadi Riguer , Mark Fowler , Randy Ramsey
IPC: G06F9/48
CPC classification number: G06F9/4831
Abstract: A processing unit performs a dispatch walk of a set of thread groups based on a programmable access pattern. The access pattern is stored at a table that is programmed with the access pattern based upon a specified command. By using the command to program the table with different access patterns, the dispatch order of the set of thread groups is adapted to better suit the processing of different data sets, thereby reducing power consumption at the processing unit, and improving overall processing efficiency.
-
公开(公告)号:US12299769B2
公开(公告)日:2025-05-13
申请号:US18602733
申请日:2024-03-12
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Randy Ramsey , Yash Ukidave
Abstract: Systems, methods, and techniques dynamically utilize load balancing for workgroup assignments between a group of shader engines by a command processor of a graphics processing unit (GPU). Based on one or more commands received for execution, a plurality of workgroups is generated for assignment to a plurality of shader engines for processing, each shader engine including a respective quantity of active compute units. Each workgroup of the plurality of workgroups is dynamically assigned to a respective shader engine for execution based at least in part on indications of available resources respectively associated with each of the shader engines. In various embodiments, the indications of available resources may include physical parameters regarding each shader engine, as well as current status information regarding the processing of workgroups assigned to each shader engine.
-
公开(公告)号:US20230205602A1
公开(公告)日:2023-06-29
申请号:US17564074
申请日:2021-12-28
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Yash UKIDAVE , Randy Ramsey , Nishank Pathak , Baturay Turkmen
CPC classification number: G06F9/5083 , G06F9/5038 , G06F9/5044 , G06F9/5016 , G06T1/20
Abstract: Parallel processors typically allocate resources to workloads based on workload priority. Priority inversion of resource allocation between workloads of different priorities reduces the operating efficiency of a parallel processor in some cases. A parallel processor mitigates priority inversion by soft-locking resources to prevent their allocation for the processing of lower priority workloads. Soft-locking is enabled responsive to a soft-lock condition, such as one or more priority inversion heuristics exceeding corresponding thresholds or multiple failed allocations of higher priority workloads within a time period. In some cases, priority inversion heuristics include quantities of higher priority workloads and lower priority workloads that are in-flight or incoming, ratios between such quantities, quantities of render targets, or a combination of these. The soft-lock is released responsive to expiry of a soft-lock timer or incoming or in-flight higher priority workloads falling below a threshold, for example.
-
公开(公告)号:US11386518B2
公开(公告)日:2022-07-12
申请号:US16580654
申请日:2019-09-24
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Michael Mantor , Alexander Fuad Ashkar , Randy Ramsey , Mangesh P. Nijasure , Brian Emberling
Abstract: The address of the draw or dispatch packet responsible for creating an exception is tied to a shader/wavefront back to the draw command from which it originated. In various embodiments, a method of operating a graphics pipeline and exception handling includes receiving, at a command processor of a graphics processing unit (GPU), an exception signal indicating an occurrence of a pipeline exception at a shader stage of a graphics pipeline. The shader stage generates an exception signal in response to a pipeline exception and transmits the exception signal to the command processor. The command processor determines, based on the exception signal, an address of a command packet responsible for the occurrence of the pipeline exception.
-
-
-
-
-
-