PARTIAL WRITE MANAGEMENT IN A MULTI-TILED COMPUTE ENGINE

    公开(公告)号:US20210056028A1

    公开(公告)日:2021-02-25

    申请号:US17068754

    申请日:2020-10-12

    Abstract: Embodiments described herein provide a general purpose graphics processor comprising a plurality of tiles, each tile of the plurality of tiles comprising at least one execution unit, a local cache, and a cache control unit, and a high bandwidth memory communicatively coupled to the plurality of tiles, wherein the high bandwidth memory is shared between the plurality of tiles. The cache control unit is to implement a partial write management protocol to receive a partial write operation directed to a cache line in the local cache, the partial write operation comprising write data, write the data associated with the partial write operation to the local cache when the cache line is in a modified state, and forward the write data associated with the partial write operation to the high bandwidth memory when the partial write operation triggers a cache miss or when the cache line is in an exclusive state or a shared state. Other embodiments may be described and claimed.

    Block operation based acceleration
    26.
    发明授权

    公开(公告)号:US09811334B2

    公开(公告)日:2017-11-07

    申请号:US14099215

    申请日:2013-12-06

    Inventor: Ben Ashbaugh

    CPC classification number: G06F9/30 G06F12/00 G06T1/20 G06T1/60

    Abstract: Apparatuses, systems, and methods may implement a block operation on a data block. The block operation may include a data transfer event involving system memory to be performed by an element block independently of shared local memory. The block operation may also include a data transfer event involving system memory to be performed by the element block using one memory address for the element block. In addition, the block operation may include a data transfer event including a data register and/or excluding shared local memory to be performed by the element block. The block operation may include a data transfer event involving one or more rows of data. The width of the data block may be implicitly defined, based on the number of elements in the element block. In one example, the block operation may be implemented for a scalar, or single instruction multiple thread program as a built-in function.

    Multi-tile graphics processing unit

    公开(公告)号:US12217327B2

    公开(公告)日:2025-02-04

    申请号:US17978426

    申请日:2022-11-01

    Abstract: An apparatus to facilitate processing in a multi-tile device is disclosed. In one embodiment, the apparatus includes a graphics processor comprising a first semiconductor die including a first high-bandwidth memory (HBM) device, a second semiconductor die including a second HBM device, and a third semiconductor die coupled with the first semiconductor die and the second semiconductor die in a 2.5-dimensional (2.5D) arrangement. The third semiconductor die includes a graphics processing resource and a cache coupled with the graphics processing resource. The cache is configurable to cache data associated with memory accessed by the graphics processing resource and the graphics processing resource includes a general-purpose graphics processor core and a tensor core.

Patent Agency Ranking