LOW POWER AND LOW LATENCY GPU COPROCESSOR FOR PERSISTENT COMPUTING

    公开(公告)号:US20210201439A1

    公开(公告)日:2021-07-01

    申请号:US17181300

    申请日:2021-02-22

    Abstract: Systems, apparatuses, and methods for implementing a graphics processing unit (GPU) coprocessor are disclosed. The GPU coprocessor includes a SIMD unit with the ability to self-schedule sub-wave procedures based on input data flow events. A host processor sends messages targeting the GPU coprocessor to a queue. In response to detecting a first message in the queue, the GPU coprocessor schedules a first sub-task for execution. The GPU coprocessor includes an inter-lane crossbar and intra-lane biased indexing mechanism for a vector general purpose register (VGPR) file. The VGPR file is split into two files. The first VGPR file is a larger register file with one read port and one write port. The second VGPR file is a smaller register file with multiple read ports and one write port. The second VGPR introduces the ability to co-issue more than one instruction per clock cycle.

    Software Only Inter-Compute Unit Redundant Multithreading for GPUs
    2.
    发明申请
    Software Only Inter-Compute Unit Redundant Multithreading for GPUs 有权
    仅用于软件的计算单元冗余多线程的GPU

    公开(公告)号:US20140373028A1

    公开(公告)日:2014-12-18

    申请号:US13920524

    申请日:2013-06-18

    Abstract: A system, method and computer program product to execute a first and a second work-group, and compare the signature variables of the first work-group to the signature variables of the second work-group via a synchronization mechanism. The first and the second work-group are mapped to an identifier via software. This mapping ensures that the first and second work-groups execute exactly the same data for exactly the same code without changes to the underlying hardware. By executing the first and second work-groups independently, the underlying computation of the first and second work-groups can be verified. Moreover, system performance is not substantially affected because the execution results of the first and second work-groups are compared only at specified comparison points.

    Abstract translation: 一种用于执行第一和第二工作组的系统,方法和计算机程序产品,并且经由同步机制将第一工作组的签名变量与第二工作组的签名变量进行比较。 第一个和第二个工作组通过软件映射到一个标识符。 此映射确保第一个和第二个工作组对完全相同的代码执行完全相同的数据,而不会更改底层硬件。 通过独立地执行第一和第二工作组,可以验证第一和第二工作组的基础计算。 此外,由于第一和第二工作组的执行结果仅在指定的比较点进行比较,系统性能基本上不受影响。

    SYSTEM AND METHOD FOR PROVIDING LOW LATENCY TO APPLICATIONS USING HETEROGENEOUS PROCESSORS
    5.
    发明申请
    SYSTEM AND METHOD FOR PROVIDING LOW LATENCY TO APPLICATIONS USING HETEROGENEOUS PROCESSORS 有权
    使用异构处理器提供低延迟应用的系统和方法

    公开(公告)号:US20130328891A1

    公开(公告)日:2013-12-12

    申请号:US13912438

    申请日:2013-06-07

    Abstract: Methods, apparatuses, and computer readable media are disclosed for responding to requests. A method of responding to requests may include receiving requests comprising callback functions. The one or more requests may be received in a first memory associated with processors of a first type, which may be CPUs. The requests may be moved to a second memory. The second memory may be associated with processors of a second type, which may be GPUs. GPU threads may process the requests to determine a result for the requests, when a number of the requests is at least a threshold number. The method may include moving the results to the first memory. The method may include the CPUs executing the one or more callback functions with the corresponding result. A GPU persistent thread may check the number of requests to determine when a threshold number of requests is reached.

    Abstract translation: 公开了用于响应请求的方法,装置和计算机可读介质。 响应请求的方法可以包括接收包括回调函数的请求。 可以在与可以是CPU的第一类型的处理器相关联的第一存储器中接收一个或多个请求。 请求可以被移动到第二存储器。 第二存储器可以与第二类型的处理器相关联,处理器可以是GPU。 当许多请求至少为阈值时,GPU线程可以处理请求以确定请求的结果。 该方法可以包括将结果移动到第一存储器。 该方法可以包括执行具有相应结果的一个或多个回调函数的CPU。 GPU持久线程可以检查确定何时达到阈值数量的请求数。

    Low power and low latency GPU coprocessor for persistent computing

    公开(公告)号:US10929944B2

    公开(公告)日:2021-02-23

    申请号:US15360057

    申请日:2016-11-23

    Abstract: Systems, apparatuses, and methods for implementing a graphics processing unit (GPU) coprocessor are disclosed. The GPU coprocessor includes a SIMD unit with the ability to self-schedule sub-wave procedures based on input data flow events. A host processor sends messages targeting the GPU coprocessor to a queue. In response to detecting a first message in the queue, the GPU coprocessor schedules a first sub-task for execution. The GPU coprocessor includes an inter-lane crossbar and intra-lane biased indexing mechanism for a vector general purpose register (VGPR) file. The VGPR file is split into two files. The first VGPR file is a larger register file with one read port and one write port. The second VGPR file is a smaller register file with multiple read ports and one write port. The second VGPR introduces the ability to co-issue more than one instruction per clock cycle.

    System and method for providing low latency to applications using heterogeneous processors
    8.
    发明授权
    System and method for providing low latency to applications using heterogeneous processors 有权
    使用异构处理器为应用程序提供低延迟的系统和方法

    公开(公告)号:US09495718B2

    公开(公告)日:2016-11-15

    申请号:US13912438

    申请日:2013-06-07

    Abstract: Methods, apparatuses, and computer readable media are disclosed for responding to requests. A method of responding to requests may include receiving requests comprising callback functions. The one or more requests may be received in a first memory associated with processors of a first type, which may be CPUs. The requests may be moved to a second memory. The second memory may be associated with processors of a second type, which may be GPUs. GPU threads may process the requests to determine a result for the requests, when a number of the requests is at least a threshold number. The method may include moving the results to the first memory. The method may include the CPUs executing the one or more callback functions with the corresponding result. A GPU persistent thread may check the number of requests to determine when a threshold number of requests is reached.

    Abstract translation: 公开了用于响应请求的方法,装置和计算机可读介质。 响应请求的方法可以包括接收包括回调函数的请求。 可以在与可以是CPU的第一类型的处理器相关联的第一存储器中接收一个或多个请求。 请求可以被移动到第二存储器。 第二存储器可以与第二类型的处理器相关联,处理器可以是GPU。 当许多请求至少为阈值时,GPU线程可以处理请求以确定请求的结果。 该方法可以包括将结果移动到第一存储器。 该方法可以包括执行具有相应结果的一个或多个回调函数的CPU。 GPU持久线程可以检查确定何时达到阈值数量的请求数。

    Software Only Intra-Compute Unit Redundant Multithreading for GPUs
    9.
    发明申请
    Software Only Intra-Compute Unit Redundant Multithreading for GPUs 有权
    用于GPU的软件内部计算单元冗余多线程

    公开(公告)号:US20140368513A1

    公开(公告)日:2014-12-18

    申请号:US13920574

    申请日:2013-06-18

    Abstract: A system, method and computer program product to execute a first and a second work-item, and compare the signature variable of the first work-item to the signature variable of the second work-item. The first and the second work-items are mapped to an identifier via software. This mapping ensures that the first and second work-items execute exactly the same data for exactly the same code without changes to the underlying hardware. By executing the first and second work-items independently, the underlying computation of the first and second work-item can be verified. Moreover, system performance is not substantially affected because the execution results of the first and second work-items are compared only at specified comparison points.

    Abstract translation: 一种用于执行第一和第二工作项目的系统,方法和计算机程序产品,并且将第一工作项目的签名变量与第二工作项目的签名变量进行比较。 第一个和第二个工作项通过软件映射到一个标识符。 此映射确保第一个和第二个工作项完全相同的数据完全相同的代码,而不会更改底层硬件。 通过独立地执行第一和第二工作项目,可以验证第一和第二工件的基础计算。 此外,系统性能基本上不受影响,因为第一和第二工作项目的执行结果仅在指定的比较点进行比较。

    Software only inter-compute unit redundant multithreading for GPUs
    10.
    发明授权
    Software only inter-compute unit redundant multithreading for GPUs 有权
    用于GPU的仅软件间计算单元冗余多线程

    公开(公告)号:US09274904B2

    公开(公告)日:2016-03-01

    申请号:US13920524

    申请日:2013-06-18

    Abstract: A system, method and computer program product to execute a first and a second work-group, and compare the signature variables of the first work-group to the signature variables of the second work-group via a synchronization mechanism. The first and the second work-group are mapped to an identifier via software. This mapping ensures that the first and second work-groups execute exactly the same data for exactly the same code without changes to the underlying hardware. By executing the first and second work-groups independently, the underlying computation of the first and second work-groups can be verified. Moreover, system performance is not substantially affected because the execution results of the first and second work-groups are compared only at specified comparison points.

    Abstract translation: 一种用于执行第一和第二工作组的系统,方法和计算机程序产品,并且经由同步机制将第一工作组的签名变量与第二工作组的签名变量进行比较。 第一个和第二个工作组通过软件映射到一个标识符。 此映射确保第一个和第二个工作组对完全相同的代码执行完全相同的数据,而不会更改底层硬件。 通过独立地执行第一和第二工作组,可以验证第一和第二工作组的基础计算。 此外,由于第一和第二工作组的执行结果仅在指定的比较点进行比较,系统性能基本上不受影响。

Patent Agency Ranking