HARDWARE ENHANCEMENTS FOR MATRIX LOAD/STORE INSTRUCTIONS

    公开(公告)号:US20240069914A1

    公开(公告)日:2024-02-29

    申请号:US17893985

    申请日:2022-08-23

    CPC classification number: G06F9/30036 G06F9/30043 G06F9/3455 G06F9/3877

    Abstract: Embodiments described herein provide a system to enable access to an n-dimensional tensor in memory of a graphics processor via a batch of two-dimensional block access messages. One embodiment provides a graphics processor comprising general-purpose graphics execution resources coupled with the system interface, the general-purpose graphics execution resources including a matrix accelerator. The matrix accelerator is configured to perform a matrix operation on a plurality of tensors stored in a memory. Circuitry is included to facilitate access to the memory by the general-purpose graphics execution resources. The circuitry is configured to receive a request to access a tensor of the plurality of tensors and generate a batch of two-dimensional block access messages along a dimension of n>2 of the tensor. The batch of two-dimensional block access messages enables access to the tensor by the matrix accelerator.

    Handling pipeline submissions across many compute units

    公开(公告)号:US11803934B2

    公开(公告)日:2023-10-31

    申请号:US17591152

    申请日:2022-02-02

    CPC classification number: G06T1/20 G06T15/005 G06T2200/04

    Abstract: One embodiment provides an apparatus comprising an interconnect fabric comprising one or more fabric switches, a plurality of memory interfaces coupled to the interconnect fabric to provide access to a plurality of memory devices, an input/output (IO) interface coupled to the interconnect fabric to provide access to IO devices, an array of multiprocessors coupled to the interconnect fabric, scheduling circuitry to distribute a plurality of thread groups across the array of multiprocessors, each thread group comprising a plurality of threads and each thread comprising a plurality of instructions to be executed by at least one of the multiprocessors, and a first multiprocessor of the array of multiprocessors to be assigned to process a first thread group comprising a first plurality of threads, the first multiprocessor comprising a plurality of parallel execution circuits.

Patent Agency Ranking