Multiple register allocation sizes for threads

    公开(公告)号:US12210905B2

    公开(公告)日:2025-01-28

    申请号:US17358650

    申请日:2021-06-25

    Abstract: Provision of multiple register allocation sizes for threads is described. An example of a system includes one or more processors including a graphics processor, the graphics processor including at least a first local thread dispatcher (TDL) and multiple processing resources, each processing resource including a plurality of registers; and memory for storage of data for processing, wherein the one or more processors are to determine a register size for a first thread; identify one or more processing resources having sufficient register space for the first thread; select a processing resource of the one or more processing resources having sufficient register space to assign the first thread; select an available thread slot of the selected processing resource for the first thread; and allocate registers of the selected processing resource for the first thread.

    Computing efficient cross channel operations in parallel computing machines using systolic arrays

    公开(公告)号:US11669490B2

    公开(公告)日:2023-06-06

    申请号:US17518202

    申请日:2021-11-03

    CPC classification number: G06F15/8046 G06F15/8007 G06F17/16 G06N20/00

    Abstract: An apparatus to facilitate computing efficient cross channel operations in parallel computing machines using systolic arrays is disclosed. The apparatus includes a plurality of registers and one or more processing elements communicably coupled to the plurality of registers. The one or more processing elements include a systolic array circuit to perform cross-channel operations on source data received from a single source register of the plurality of registers, wherein the systolic array circuit is modified to: receive inputs from the single source register at different stages of the systolic array circuit; perform cross-channel operations at channels of the systolic array circuit; bypass disabled channels of the systolic array circuit, the disabled channels not used to compute the cross-channel operations; and broadcast a final result of a final stage of the systolic array circuit to all channels of a destination register.

    Resource load balancing based on usage and power limits

    公开(公告)号:US10983581B2

    公开(公告)日:2021-04-20

    申请号:US15859598

    申请日:2017-12-31

    Abstract: Methods and apparatus relating to techniques for resource load balancing based on usage and/or power limits are described. In an embodiment, resource load balancing logic causes a first resource of a processor to operate at a first frequency and a second resource of the processor to operate at a second frequency. Memory stores a plurality of frequency values. The resource load balancing logic also selects the first frequency and the second frequency based on the stored plurality of frequency values. Operation of the first resource at the first frequency and the second resource at the second frequency in turn causes the processor to operate under a power budget. The resource load balancing logic causes change to the first frequency and the second frequency in response to a determination that operation of the processor is different than the power budget. Other embodiments are also disclosed and claimed.

    Systolic array of arbitrary physical and logical depth

    公开(公告)号:US12174783B2

    公开(公告)日:2024-12-24

    申请号:US17304678

    申请日:2021-06-24

    Abstract: A processing apparatus includes a processing resource including a general-purpose parallel processing engine and a matrix accelerator. The matrix accelerator includes first circuitry to receive a command to perform operations associated with an instruction, second circuitry to configure the matrix accelerator according to a physical depth of a systolic array within the matrix accelerator and a logical depth associated with the instruction, third circuitry to read operands for the instruction from a register file associated with the systolic array, fourth circuitry to perform operations for the instruction via one or more passes through one or more physical pipeline stages of the systolic array based on a configuration performed by the second circuitry, and fifth circuitry to write output of the operations to the register file associated with the systolic array.

Patent Agency Ranking