-
公开(公告)号:US12236529B2
公开(公告)日:2025-02-25
申请号:US17562653
申请日:2021-12-27
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: Christopher J. Brennan , Randy Wayne Ramsey , Nishank Pathak , Ricky Wai Yeung Iu , Jimshed Mirza , Anthony Chan
Abstract: Systems, apparatuses, and methods for implementing a discard engine in a graphics pipeline are disclosed. A system includes a graphics pipeline with a geometry engine launching shaders that generate attribute data for vertices of each primitive of a set of primitives. The attribute data is consumed by pixel shaders, with each pixel shader generating a deallocation message when the pixel shader no longer needs the attribute data. A discard engine gathers deallocations from multiple pixel shaders and determines when the attribute data is no longer needed. Once a block of attributes has been consumed by all potential pixel shader consumers, the discard engine deallocates the given block of attributes. The discard engine sends a discard command to the caches so that the attribute data can be invalidated and not written back to memory.
-
公开(公告)号:US20250111461A1
公开(公告)日:2025-04-03
申请号:US18374299
申请日:2023-09-28
Applicant: ADVANCED MICRO DEVICES, INC. , ATI TECHNOLOGIES ULC
Inventor: Alexander Fuad Ashkar , Guennadi Riguer , Nishank Pathak
IPC: G06T1/20
Abstract: A processing system includes two or more graphics cores each disposed on respective dies and configured for concurrent processing of command packets. To this end, the processing system is configured to determine two or more command partitions associated with a command packet and to assign each command partition to a graphics core. Each graphics core then executes the same command packet by only performing instructions of the command packet associated with the command partitions assigned to the graphics core. Further, after executing an instructions of the command packet based on one or more assigned partitions, each graphics core adjusts one or more counters used to synchronize the execution of the command packet across the graphics cores.
-
公开(公告)号:US20230206559A1
公开(公告)日:2023-06-29
申请号:US17562653
申请日:2021-12-27
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: Christopher J. Brennan , Randy Wayne Ramsey , Nishank Pathak , Ricky Wai Yeung Iu , Jimshed Mirza , Anthony Chan
CPC classification number: G06T17/20 , G06T17/10 , G06T15/005 , G06T1/60
Abstract: Systems, apparatuses, and methods for implementing a discard engine in a graphics pipeline are disclosed. A system includes a graphics pipeline with a geometry engine launching shaders that generate attribute data for vertices of each primitive of a set of primitives. The attribute data is consumed by pixel shaders, with each pixel shader generating a deallocation message when the pixel shader no longer needs the attribute data. A discard engine gathers deallocations from multiple pixel shaders and determines when the attribute data is no longer needed. Once a block of attributes has been consumed by all potential pixel shader consumers, the discard engine deallocates the given block of attributes. The discard engine sends a discard command to the caches so that the attribute data can be invalidated and not written back to memory.
-
公开(公告)号:US12169896B2
公开(公告)日:2024-12-17
申请号:US17489105
申请日:2021-09-29
Applicant: Advanced Micro Devices, Inc.
Inventor: Todd Martin , Tad Robert Litwiller , Nishank Pathak , Randy Wayne Ramsey , Michael J. Mantor , Christopher J. Brennan , Mark M. Leather , Ryan James Cash
Abstract: Systems, apparatuses, and methods for preemptively reserving buffer space for primitives and positions in a graphics pipeline are disclosed. A system includes a graphics pipeline frontend with any number of geometry engines coupled to corresponding shader engines. Each geometry engine launches shader wavefronts to execute on a corresponding shader engine. The geometry engine preemptively reserves buffer space for each wavefront prior to the wavefront being launched on the shader engine. When the shader engine executes a wavefront, the shader engine exports primitive and position data to the reserved buffer space. Multiple scan converters will consume the primitive and position data, with each scan converter consuming primitive and position data based on the screen coverage of the scan converter. After consuming the primitive and position data, the scan converters mark the buffer space as freed so that the geometry engine can then allocate the freed buffer space to subsequent shader wavefronts.
-
公开(公告)号:US20230205602A1
公开(公告)日:2023-06-29
申请号:US17564074
申请日:2021-12-28
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Yash UKIDAVE , Randy Ramsey , Nishank Pathak , Baturay Turkmen
CPC classification number: G06F9/5083 , G06F9/5038 , G06F9/5044 , G06F9/5016 , G06T1/20
Abstract: Parallel processors typically allocate resources to workloads based on workload priority. Priority inversion of resource allocation between workloads of different priorities reduces the operating efficiency of a parallel processor in some cases. A parallel processor mitigates priority inversion by soft-locking resources to prevent their allocation for the processing of lower priority workloads. Soft-locking is enabled responsive to a soft-lock condition, such as one or more priority inversion heuristics exceeding corresponding thresholds or multiple failed allocations of higher priority workloads within a time period. In some cases, priority inversion heuristics include quantities of higher priority workloads and lower priority workloads that are in-flight or incoming, ratios between such quantities, quantities of render targets, or a combination of these. The soft-lock is released responsive to expiry of a soft-lock timer or incoming or in-flight higher priority workloads falling below a threshold, for example.
-
公开(公告)号:US20230095365A1
公开(公告)日:2023-03-30
申请号:US17489059
申请日:2021-09-29
Applicant: Advanced Micro Devices, Inc.
Inventor: Todd Martin , Tad Robert Litwiller , Nishank Pathak , Randy Wayne Ramsey
IPC: G06F9/4401 , G06F9/30 , G06F9/54
Abstract: Systems, apparatuses, and methods for performing geometry work in parallel on multiple chiplets are disclosed. A system includes a chiplet processor with multiple chiplets for performing graphics work in parallel. Instead of having a central distributor to distribute work to the individual chiplets, each chiplet determines on its own the work to be performed. For example, during a draw call, each chiplet calculates which portions to fetch and process of one or more index buffer(s) corresponding to one or more graphics object(s) of the draw call. Once the portions are calculated, each chiplet fetches the corresponding indices and processes the indices. The chiplets perform these tasks in parallel and independently of each other. When the index buffer(s) are processed, one or more subsequent step(s) in the graphics rendering process are performed in parallel by the chiplets.
-
公开(公告)号:US11210757B2
公开(公告)日:2021-12-28
申请号:US16713472
申请日:2019-12-13
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Todd Martin , Tad Litwiller , Nishank Pathak , Mangesh P. Nijasure
IPC: G06T1/20 , H04L12/863 , H04L12/861
Abstract: A graphics processing unit (GPU) includes a packet management component that automatically aggregates data from input packets. In response to determining that a received first input packet does not indicate a send condition, and in response to determining that a generated output packet would be smaller than an output size threshold, the packet management component aggregates data corresponding to the first input packet with data corresponding to a second input packet stored at a packet buffer. In response to determining that a received third input packet indicates a send condition, the packet management component sends the aggregated data to a compute unit in an output packet and performs an operation indicated by the send condition.
-
公开(公告)号:US12062126B2
公开(公告)日:2024-08-13
申请号:US17489008
申请日:2021-09-29
Applicant: Advanced Micro Devices, Inc.
Inventor: Todd Martin , Tad Robert Litwiller , Nishank Pathak , Randy Wayne Ramsey
IPC: G06T15/00
CPC classification number: G06T15/005
Abstract: Systems, apparatuses, and methods for loading multiple primitives per thread in a graphics pipeline are disclosed. A system includes a graphics pipeline frontend with a geometry engine, shader processor input (SPI), and a plurality of compute units. The geometry engine generates primitives which are accumulated by the SPI into primitive groups. While accumulating primitives, the SPI tracks the number of vertices and primitives per group. The SPI determines wavefront boundaries based on mapping a single vertex to each thread of the wavefront while allowing more than one primitive per thread. The SPI launches wavefronts with one vertex per thread and potentially multiple primitives per thread. The compute units execute a vertex phase and a multi-cycle primitive phase for wavefronts with multiple primitives per thread.
-
公开(公告)号:US11755336B2
公开(公告)日:2023-09-12
申请号:US17489059
申请日:2021-09-29
Applicant: Advanced Micro Devices, Inc.
Inventor: Todd Martin , Tad Robert Litwiller , Nishank Pathak , Randy Wayne Ramsey
IPC: G06T1/60 , G06F9/4401 , G06F9/30 , G06F9/54
CPC classification number: G06F9/4411 , G06F9/3009 , G06F9/544
Abstract: Systems, apparatuses, and methods for performing geometry work in parallel on multiple chiplets are disclosed. A system includes a chiplet processor with multiple chiplets for performing graphics work in parallel. Instead of having a central distributor to distribute work to the individual chiplets, each chiplet determines on its own the work to be performed. For example, during a draw call, each chiplet calculates which portions to fetch and process of one or more index buffer(s) corresponding to one or more graphics object(s) of the draw call. Once the portions are calculated, each chiplet fetches the corresponding indices and processes the indices. The chiplets perform these tasks in parallel and independently of each other. When the index buffer(s) are processed, one or more subsequent step(s) in the graphics rendering process are performed in parallel by the chiplets.
-
公开(公告)号:US11508124B2
公开(公告)日:2022-11-22
申请号:US17121965
申请日:2020-12-15
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Nishank Pathak
Abstract: A processing system includes hull shader circuitry that launches thread groups including one or more primitives. The hull shader circuitry also generates tessellation factors that indicate subdivisions of the primitives. The processing system also includes throttling circuitry that estimates a primitive launch time interval for the domain shader based on the tessellation factors and selectively throttles launching of the thread groups from the hull shader circuitry based on the primitive launch time interval of the domain shader and a hull shader latency. In some cases, the throttling circuitry includes a first counter that is incremented in response to launching a thread group from the buffer and a second counter that modifies the first counter based on a measured latency of the domain shader.
-
-
-
-
-
-
-
-
-