-
公开(公告)号:US20230014741A1
公开(公告)日:2023-01-19
申请号:US17945290
申请日:2022-09-15
Applicant: Intel Corporation
Inventor: Antonio Valles , Rebecca David
IPC: G06F9/50
Abstract: Embodiments described herein are generally directed to improving performance of high-performance computing (HPC) or artificial intelligence (AI) workloads on cluster computer systems. According to one embodiment, a section of a high-performance computing (HPC) or artificial intelligence (AI) workload executing on a cluster computer system is identified as significant to a figure of merit (FOM) of the workload. An alternate placement among multiple heterogeneous compute resources of a node of the cluster computer system is determined for a portion of the section currently executing on a given compute resource of the multiple heterogeneous compute resources. After predicting an improvement to the FOM based on the alternate placement, the portion is relocated to the alternate placement.
-
2.
公开(公告)号:US20190324757A1
公开(公告)日:2019-10-24
申请号:US15957695
申请日:2018-04-19
Applicant: Intel Corporation
Inventor: James Valerio , Ben Ashbaugh , Pradeep Ramani , Rebecca David , Sabareesh Ganapathy , Hashem Hashemi
Abstract: Embodiments described herein provide techniques to maintain high temporal cache locality between independent threads having the same or similar memory access pattern. One embodiment provides a graphics processing unit comprising an instruction execution pipeline including hardware execution logic and a thread dispatcher to process a set of commands for execution and distribute multiple groups of hardware threads to the hardware execution logic to execute the set of commands. The thread dispatcher can be configured to concurrently distribute a first group of the multiple groups of hardware threads to the hardware execution logic and withhold distribution of additional hardware threads for the set of commands until after the first group completes execution.
-