SYSTEM AND METHOD TO SUPPORT MULTIPLE WALKERS PER COMMAND

    公开(公告)号:US20200286201A1

    公开(公告)日:2020-09-10

    申请号:US16297129

    申请日:2019-03-08

    Abstract: Embodiments described herein provide an apparatus comprising a processor to configure a plurality of contexts of a command engine to execute a graphics workload comprising a plurality of walkers, allocate, from a pool of execution units of a graphics processor, a subset of execution units to each walker in the plurality of walkers based at least in part on the predetermined number of walkers configured for the context, for each context in the plurality of contexts, dispatch one or more walkers of the plurality of walkers to the execution units, and upon dispatch of the one or more walkers of the plurality of walkers, write an opcode to a computer-readable memory indicating that the dispatch of the walker is complete, wherein the opcode comprises dependency data for the one or more walkers of the plurality of walkers. Other embodiments may be described and claimed.

    Instruction prefetch based on thread dispatch commands

    公开(公告)号:US12124852B2

    公开(公告)日:2024-10-22

    申请号:US18347964

    申请日:2023-07-06

    CPC classification number: G06F9/3802 G06F13/28 G06T1/20

    Abstract: A graphics processing device is provided that includes a set of compute units to execute a workload, a cache coupled with the set of compute units, and circuitry coupled with the cache and the set of compute units. The circuitry is configured to, in response to a cache miss for the read from a first cache, broadcast an event within the graphics processor device to identify data associated with the cache miss, receive the event at a second compute unit in the set of compute units, and prefetch the data identified by the event into a second cache that is local to the second compute unit before an attempt to read the instruction or data by the second thread.

    BASE PLUS OFFSET ADDRESSING FOR LOAD/STORE MESSAGES

    公开(公告)号:US20240095038A1

    公开(公告)日:2024-03-21

    申请号:US17949904

    申请日:2022-09-21

    CPC classification number: G06F15/7839 G06F9/30043

    Abstract: Embodiments described herein provide a technique to decompose 64-bit per-lane virtual addresses to access a plurality of data elements on behalf of a multi-lane parallel processing execution resource of a graphics or compute accelerator. The 64-bit per-lane addresses are decomposed into a base address and a plurality of per-lane offsets for transmission to memory access circuitry. The memory access circuitry then combines the base address and the per-lane offsets to reconstruct the per-lane addresses.

    UNIFIED SUBMIT PORT FOR GRAPHICS PROCESSING

    公开(公告)号:US20230094002A1

    公开(公告)日:2023-03-30

    申请号:US17484711

    申请日:2021-09-24

    Abstract: Dynamic routing of texture-load in graphics processing is described. An example of an apparatus includes a graphics processor including a plurality of processing engines of a class of processing engines of the graphic processor; a set of queues for the plurality of processing engines; and a unified submit port for the plurality of processing engines, wherein the unified submit port is to notify a scheduler regarding availability of slots in the set of queues for receipt of workload contexts; and wherein, upon the unified submit port receiving a workload context for processing by the plurality of processing engines, the unified submit port is to detect an available processing engine of the plurality of processing engines and direct the received context to a slot of the set of queues for processing by the available processing engine.

    Partial write management in a multi-tiled compute engine

    公开(公告)号:US11301384B2

    公开(公告)日:2022-04-12

    申请号:US17068754

    申请日:2020-10-12

    Abstract: Embodiments described herein provide a general purpose graphics processor comprising a plurality of tiles, each tile of the plurality of tiles comprising at least one execution unit, a local cache, and a cache control unit, and a high bandwidth memory communicatively coupled to the plurality of tiles, wherein the high bandwidth memory is shared between the plurality of tiles. The cache control unit is to implement a partial write management protocol to receive a partial write operation directed to a cache line in the local cache, the partial write operation comprising write data, write the data associated with the partial write operation to the local cache when the cache line is in a modified state, and forward the write data associated with the partial write operation to the high bandwidth memory when the partial write operation triggers a cache miss or when the cache line is in an exclusive state or a shared state. Other embodiments may be described and claimed.

    System and method to support multiple walkers per command

    公开(公告)号:US10776897B1

    公开(公告)日:2020-09-15

    申请号:US16297129

    申请日:2019-03-08

    Abstract: Embodiments described herein provide an apparatus comprising a processor to configure a plurality of contexts of a command engine to execute a graphics workload comprising a plurality of walkers, allocate, from a pool of execution units of a graphics processor, a subset of execution units to each walker in the plurality of walkers based at least in part on the predetermined number of walkers configured for the context, for each context in the plurality of contexts, dispatch one or more walkers of the plurality of walkers to the execution units, and upon dispatch of the one or more walkers of the plurality of walkers, write an opcode to a computer-readable memory indicating that the dispatch of the walker is complete, wherein the opcode comprises dependency data for the one or more walkers of the plurality of walkers. Other embodiments may be described and claimed.

Patent Agency Ranking