MULTI-TILE GRAPHICS PROCESSING UNIT

    公开(公告)号:US20230051227A1

    公开(公告)日:2023-02-16

    申请号:US17978426

    申请日:2022-11-01

    Abstract: An apparatus to facilitate processing in a multi-tile device is disclosed. In one embodiment, the apparatus includes a graphics processor comprising a first semiconductor die including a first high-bandwidth memory (HBM) device, a second semiconductor die including a second HBM device, and a third semiconductor die coupled with the first semiconductor die and the second semiconductor die in a 2.5-dimensional (2.5D) arrangement. The third semiconductor die includes a graphics processing resource and a cache coupled with the graphics processing resource. The cache is configurable to cache data associated with memory accessed by the graphics processing resource and the graphics processing resource includes a general-purpose graphics processor core and a tensor core.

    Asynchronous input dependency resolution mechanism

    公开(公告)号:US12199759B2

    公开(公告)日:2025-01-14

    申请号:US17688028

    申请日:2022-03-07

    Abstract: Described herein is a graphics processor configured to perform asynchronous input dependency resolution among a group of interdependent workloads. The graphics processor can dynamically resolve input dependencies among the workloads according to a dependency relationship defined for the workloads. Dependency resolution be performed via a deferred submission mode which resolves input dependencies prior to thread dispatch to the processing resources or via immediate submission mode which resolves input dependencies at the processing resources.

    Scheduling and dispatch of GPGPU workloads

    公开(公告)号:US10235732B2

    公开(公告)日:2019-03-19

    申请号:US14142681

    申请日:2013-12-27

    Abstract: A method and system are described herein for an optimization technique on two aspects of thread scheduling and dispatch when the driver is allowed to pick the scheduling attributes. The present techniques rely on an enhanced GPGPU Walker hardware command and one dimensional local identification generation to maximize thread residency.

    Multi-tile graphics processing unit

    公开(公告)号:US12217327B2

    公开(公告)日:2025-02-04

    申请号:US17978426

    申请日:2022-11-01

    Abstract: An apparatus to facilitate processing in a multi-tile device is disclosed. In one embodiment, the apparatus includes a graphics processor comprising a first semiconductor die including a first high-bandwidth memory (HBM) device, a second semiconductor die including a second HBM device, and a third semiconductor die coupled with the first semiconductor die and the second semiconductor die in a 2.5-dimensional (2.5D) arrangement. The third semiconductor die includes a graphics processing resource and a cache coupled with the graphics processing resource. The cache is configurable to cache data associated with memory accessed by the graphics processing resource and the graphics processing resource includes a general-purpose graphics processor core and a tensor core.

    Method and apparatus for a highly efficient graphics processing unit (GPU) execution model

    公开(公告)号:US10521874B2

    公开(公告)日:2019-12-31

    申请号:US14498220

    申请日:2014-09-26

    Abstract: An apparatus and method are described for executing workloads without host intervention. For example, one embodiment of an apparatus comprises: a host processor; and a graphics processor unit (GPU) to execute a hierarchical workload responsive to one or more commands issued by the host processor, the hierarchical workload comprising a parent workload and a plurality of child workloads interconnected in a logical graph structure; and a scheduler kernel implemented by the GPU to schedule execution of the plurality of child workloads without host intervention, the scheduler kernel to evaluate conditions required for execution of the child workloads and determine an order in which to execute the child workloads on the GPU based on the evaluated conditions; the GPU to execute the child workloads in the order determined by the scheduler kernel and to provide results of parent and child workloads to the host processor following execution of all of the child workloads.

    SCHEDULING AND DISPATCH OF GPGPU WORKLOADS
    6.
    发明申请

    公开(公告)号:US20190259129A1

    公开(公告)日:2019-08-22

    申请号:US16260031

    申请日:2019-01-28

    Abstract: A method and system are described herein for an optimization technique on two aspects of thread scheduling and dispatch when the driver is allowed to pick the scheduling attributes. The present techniques rely on an enhanced GPGPU Walker hardware command and one dimensional local identification generation to maximize thread residency.

    ASYNCHRONOUS INPUT DEPENDENCY RESOLUTION MECHANISM

    公开(公告)号:US20220291955A1

    公开(公告)日:2022-09-15

    申请号:US17688028

    申请日:2022-03-07

    Abstract: Described herein is a graphics processor configured to perform asynchronous input dependency resolution among a group of interdependent workloads. The graphics processor can dynamically resolve input dependencies among the workloads according to a dependency relationship defined for the workloads. Dependency resolution be performed via a deferred submission mode which resolves input dependencies prior to thread dispatch to the processing resources or via immediate submission mode which resolves input dependencies at the processing resources.

    Scheduling and dispatch of GPGPU workloads

    公开(公告)号:US10937118B2

    公开(公告)日:2021-03-02

    申请号:US16260031

    申请日:2019-01-28

    Abstract: A method and system are described herein for an optimization technique on two aspects of thread scheduling and dispatch when the driver is allowed to pick the scheduling attributes. The present techniques rely on an enhanced GPGPU Walker hardware command and one dimensional local identification generation to maximize thread residency.

Patent Agency Ranking