-
公开(公告)号:WO2008127623A3
公开(公告)日:2010-01-07
申请号:PCT/US2008004652
申请日:2008-04-09
Applicant: APPLE INC , MUNSHI AAFTAB , SANDMEL JEREMY
Inventor: MUNSHI AAFTAB , SANDMEL JEREMY
CPC classification number: G06F9/445 , G06F8/41 , G06F9/4843 , G06F9/5044 , G06F9/541
Abstract: A method and an apparatus that schedule a plurality of executables in a schedule queue for execution in one or more physical compute devices such as CPUs or GPUs concurrently are described. One or more executables are compiled online from a source having an existing executable for a type of physical compute devices different from the one or more physical compute devices. Dependency relations among elements corresponding to scheduled executables are determined to select an executable to be executed by a plurality of threads concurrently in more than one of the physical compute devices. A thread initialized for executing an executable in a GPU of the physical compute devices are initialized for execution in another CPU of the physical compute devices if the GPU is busy with graphics processing threads. Sources and existing executables for an API function are stored in an API library to execute a plurality of executables in a plurality of physical compute devices, including the existing executables and online compiled executables from the sources.
Abstract translation: 描述了在一个或多个物理计算设备(诸如CPU或GPU)中同时调度用于在一个或多个物理计算设备中执行的调度队列中的多个可执行程序的方法和装置。 一个或多个可执行文件从具有对于一个或多个物理计算设备不同的物理计算设备的类型的现有可执行文件的源进行在线编译。 确定对应于被调度的可执行程序的元件之间的依赖关系被选择在多个物理计算设备中同时由多个线程执行的可执行文件。 如果GPU忙于图形处理线程,则初始化用于在物理计算设备的GPU中执行可执行程序的线程被初始化以在物理计算设备的另一个CPU中执行。 用于API函数的源和现有可执行文件存储在API库中以在多个物理计算设备中执行多个可执行程序,包括来自源的现有可执行文件和在线编译的可执行文件。
-
公开(公告)号:WO2008127610A2
公开(公告)日:2008-10-23
申请号:PCT/US2008004617
申请日:2008-04-09
Applicant: APPLE INC , MUNSHI AAFTAB , SANDMEL JEREMY
Inventor: MUNSHI AAFTAB , SANDMEL JEREMY
CPC classification number: G06F9/5027 , G06F8/314 , G06F8/41 , G06F8/445 , G06F8/458 , G06F9/4843 , G06F9/505 , G06F9/541 , G06T1/20 , G06T2200/28
Abstract: A method and an apparatus that execute a parallel computing program in a programming language for a parallel computing architecture are described. The parallel computing program is stored in memory in a system with parallel processors. The system includes a host processor, a graphics processing unit (GPU) coupled to the host processor and a memory coupled to at least one of the host processor and the GPU. The parallel computing program is stored in the memory to allocate threads between the host processor and the GPU. The programming language includes an API to allow an application to make calls using the API to allocate execution of the threads between the host processor and the GPU. The programming language includes host function data tokens for host functions performed in the host processor and kernel function data tokens for compute kernel functions performed in one or more compute processors, e.g GPUs or CPUs, separate from the host processor. Standard data tokens in the programming language schedule a plurality of threads for execution on a plurality of processors, such as CPUs or GPUs in parallel. Extended data tokens in the programming language implement executables for the plurality of threads according to the schedules from the standard data tokens.
Abstract translation: 描述了用于并行计算体系结构的编程语言执行并行计算程序的方法和装置。 并行计算程序存储在具有并行处理器的系统中的存储器中。 该系统包括主处理器,耦合到主处理器的图形处理单元(GPU)以及耦合到主处理器和GPU中的至少一个的存储器。 并行计算程序存储在存储器中以在主处理器和GPU之间分配线程。 编程语言包括一个API,允许应用程序使用API进行调用,以分配主机处理器与GPU之间的线程执行。 编程语言包括在主处理器中执行的用于主机功能的主机功能数据令牌和用于在与主处理器分离的一个或多个计算处理器(例如GPU或CPU)中执行的计算内核功能的内核功能数据令牌。 编程语言中的标准数据令牌调度多个线程以在多个处理器(例如并行的CPU或GPU)上执行。 编程语言中的扩展数据令牌根据来自标准数据令牌的调度执行针对多个线程的可执行文件。
-
公开(公告)号:WO2008127604A2
公开(公告)日:2008-10-23
申请号:PCT/US2008004606
申请日:2008-04-09
Applicant: APPLE INC , MUNSHI AAFTAB , SANDMEL JEREMY
Inventor: MUNSHI AAFTAB , SANDMEL JEREMY
CPC classification number: G06F9/5016 , G06F9/5044
Abstract: A method and an apparatus that allocate a stream memory and/or a local memory for a variable in an executable loaded from a host processor to the compute processor according to whether a compute processor supports a storage capability are described. The compute processor may be a graphics processing unit (GPU) or a central processing unit (CPU). Alternatively, an application running in a host processor configures storage capabilities in a compute processor, such as CPU or GPU, to determine a memory location for accessing a variable in an executable executed by a plurality of threads in the compute processor. The configuration and allocation are based on API calls in the host processor.
Abstract translation: 描述了根据计算处理器是否支持存储能力来分配从主处理器向计算处理器加载的可执行文件中的变量的流存储器和/或本地存储器的方法和装置。 计算处理器可以是图形处理单元(GPU)或中央处理单元(CPU)。 或者,运行在主处理器中的应用程序配置计算处理器(例如CPU或GPU)中的存储能力,以确定用于访问由计算处理器中的多个线程执行的可执行文件中的变量的存储器位置。 配置和分配基于主机处理器中的API调用。
-
-