DYNAMIC CROSS-ARCHITECTURE APPLICATION ADAPTION

    公开(公告)号:US20230014741A1

    公开(公告)日:2023-01-19

    申请号:US17945290

    申请日:2022-09-15

    Abstract: Embodiments described herein are generally directed to improving performance of high-performance computing (HPC) or artificial intelligence (AI) workloads on cluster computer systems. According to one embodiment, a section of a high-performance computing (HPC) or artificial intelligence (AI) workload executing on a cluster computer system is identified as significant to a figure of merit (FOM) of the workload. An alternate placement among multiple heterogeneous compute resources of a node of the cluster computer system is determined for a portion of the section currently executing on a given compute resource of the multiple heterogeneous compute resources. After predicting an improvement to the FOM based on the alternate placement, the portion is relocated to the alternate placement.

    MAINTAINING HIGH TEMPORAL CACHE LOCALITY BETWEEN INDEPENDENT THREADS HAVING THE SAME ACCESS PATTERN

    公开(公告)号:US20190324757A1

    公开(公告)日:2019-10-24

    申请号:US15957695

    申请日:2018-04-19

    Abstract: Embodiments described herein provide techniques to maintain high temporal cache locality between independent threads having the same or similar memory access pattern. One embodiment provides a graphics processing unit comprising an instruction execution pipeline including hardware execution logic and a thread dispatcher to process a set of commands for execution and distribute multiple groups of hardware threads to the hardware execution logic to execute the set of commands. The thread dispatcher can be configured to concurrently distribute a first group of the multiple groups of hardware threads to the hardware execution logic and withhold distribution of additional hardware threads for the set of commands until after the first group completes execution.

Patent Agency Ranking