MAINTAINING HIGH TEMPORAL CACHE LOCALITY BETWEEN INDEPENDENT THREADS HAVING THE SAME ACCESS PATTERN

    公开(公告)号:US20190324757A1

    公开(公告)日:2019-10-24

    申请号:US15957695

    申请日:2018-04-19

    Abstract: Embodiments described herein provide techniques to maintain high temporal cache locality between independent threads having the same or similar memory access pattern. One embodiment provides a graphics processing unit comprising an instruction execution pipeline including hardware execution logic and a thread dispatcher to process a set of commands for execution and distribute multiple groups of hardware threads to the hardware execution logic to execute the set of commands. The thread dispatcher can be configured to concurrently distribute a first group of the multiple groups of hardware threads to the hardware execution logic and withhold distribution of additional hardware threads for the set of commands until after the first group completes execution.

    FORWARD PROGRESS GUARANTEE USING SINGLE-LEVEL SYNCHRONIZATION AT INDIVIDUAL THREAD GRANULARITY

    公开(公告)号:US20230153176A1

    公开(公告)日:2023-05-18

    申请号:US17528386

    申请日:2021-11-17

    CPC classification number: G06F9/522 G06F9/48

    Abstract: An apparatus to facilitate facilitating forward progress guarantee using single-level synchronization at individual thread granularity is disclosed. The apparatus includes a processor comprising a barrier synchronization hardware circuitry to assign a set of global named barrier identifiers (IDs) to individual execution threads of a plurality of execution threads and synchronize execution of the individual execution threads on a single level via the set of global named barrier IDs; and a plurality of processing resources to execute the plurality of execution threads and comprising divergent barrier scheduling hardware circuitry to facilitate execution flow switching from a first divergent branch executed by a first thread to a second divergent branch executed by a second thread, the execution flow switching performed responsive to the first thread stalling to wait on a named barrier of the set of global named barrier IDs.

Patent Agency Ranking