CROSS-THREAD REGISTER SHARING FOR MATRIX MULTIPLICATION COMPUTE

    公开(公告)号:US20240168807A1

    公开(公告)日:2024-05-23

    申请号:US18056949

    申请日:2022-11-18

    CPC classification number: G06F9/5027 G06F9/48 G06F9/522 G06F15/8046

    Abstract: An apparatus to facilitate cross-thread register sharing for matrix multiplication compute is disclosed. The apparatus includes matrix acceleration hardware comprising a plurality of data processing units, wherein the respective plurality of data processing units are to: receive a decoded instruction for a first thread having a first register space, wherein the decoded instruction is for a matrix multiplication operation and comprises an indication to utilize a second register space of a second thread for an operand of the decoded instruction for the first thread; access the second register space of the second thread to obtain data for the operand of the decoded instruction; and perform the matrix multiplication operation for the first thread using the data for the operand from the second register space of the second thread.

    Systolic array of arbitrary physical and logical depth

    公开(公告)号:US12174783B2

    公开(公告)日:2024-12-24

    申请号:US17304678

    申请日:2021-06-24

    Abstract: A processing apparatus includes a processing resource including a general-purpose parallel processing engine and a matrix accelerator. The matrix accelerator includes first circuitry to receive a command to perform operations associated with an instruction, second circuitry to configure the matrix accelerator according to a physical depth of a systolic array within the matrix accelerator and a logical depth associated with the instruction, third circuitry to read operands for the instruction from a register file associated with the systolic array, fourth circuitry to perform operations for the instruction via one or more passes through one or more physical pipeline stages of the systolic array based on a configuration performed by the second circuitry, and fifth circuitry to write output of the operations to the register file associated with the systolic array.

    GLOBAL OPTIMAL PATH DETERMINATION UTILIZING PARALLEL PROCESSING

    公开(公告)号:US20190180406A1

    公开(公告)日:2019-06-13

    申请号:US15839640

    申请日:2017-12-12

    Abstract: Embodiments are generally directed to global optimal path determination utilizing parallel processing. An embodiment of an apparatus includes a central processing unit (CPU); a graphical processing unit (GPU), the GPU being capable of a plurality of processing threads; and a memory to store data for a system under evaluation, the system under evaluation including a set of nodes having a first endpoint, a second endpoint, and multiple paths between the first endpoint and the second endpoint. The apparatus is to determine a most energy efficient path between the first endpoint and the second endpoint utilizing parallel processing of a push and relabel graph cut algorithm. Performance of the push and relabel algorithm includes a plurality of process iterations, each process iteration including performance of a relabel operation, a push operation in a first direction, and a push operation in a second direction.

    SYSTOLIC ARRAY OF ARBITRARY PHYSICAL AND LOGICAL DEPTH

    公开(公告)号:US20220414053A1

    公开(公告)日:2022-12-29

    申请号:US17304678

    申请日:2021-06-24

    Abstract: A processing apparatus includes a processing resource including a general-purpose parallel processing engine and a matrix accelerator. The matrix accelerator includes first circuitry to receive a command to perform operations associated with an instruction, second circuitry to configure the matrix accelerator according to a physical depth of a systolic array within the matrix accelerator and a logical depth associated with the instruction, third circuitry to read operands for the instruction from a register file associated with the systolic array, fourth circuitry to perform operations for the instruction via one or more passes through one or more physical pipeline stages of the systolic array based on a configuration performed by the second circuitry, and fifth circuitry to write output of the operations to the register file associated with the systolic array.

    Divergent Control Flow for Fused EUs
    9.
    发明申请

    公开(公告)号:US20170372446A1

    公开(公告)日:2017-12-28

    申请号:US15190663

    申请日:2016-06-23

    Abstract: Embodiments provide support for divergent control flow in heterogeneous compute operations on a fused execution unit. On embodiment provides for a processing apparatus comprising a fused execution unit including multiple graphics execution units having a common instruction pointer; logic to serialize divergent function calls by the fused execution unit, the logic configured to compare a call target of execution channels within the fused execution unit and create multiple groups of channels, each group of channels associated with a single call target; and wherein the fused execution unit is to execute a first group of channels via a first execution unit and a second group of channels via a second execution unit.

    SYSTOLIC ARRAY OF ARBITRARY PHYSICAL AND LOGICAL DEPTH

    公开(公告)号:US20250117360A1

    公开(公告)日:2025-04-10

    申请号:US18931412

    申请日:2024-10-30

    Abstract: A processing apparatus includes a processing resource including a general-purpose parallel processing engine and a matrix accelerator. The matrix accelerator includes first circuitry to receive a command to perform operations associated with an instruction, second circuitry to configure the matrix accelerator according to a physical depth of a systolic array within the matrix accelerator and a logical depth associated with the instruction, third circuitry to read operands for the instruction from a register file associated with the systolic array, fourth circuitry to perform operations for the instruction via one or more passes through one or more physical pipeline stages of the systolic array based on a configuration performed by the second circuitry, and fifth circuitry to write output of the operations to the register file associated with the systolic array.

Patent Agency Ranking