Multiple register allocation sizes for threads

    公开(公告)号:US12210905B2

    公开(公告)日:2025-01-28

    申请号:US17358650

    申请日:2021-06-25

    Abstract: Provision of multiple register allocation sizes for threads is described. An example of a system includes one or more processors including a graphics processor, the graphics processor including at least a first local thread dispatcher (TDL) and multiple processing resources, each processing resource including a plurality of registers; and memory for storage of data for processing, wherein the one or more processors are to determine a register size for a first thread; identify one or more processing resources having sufficient register space for the first thread; select a processing resource of the one or more processing resources having sufficient register space to assign the first thread; select an available thread slot of the selected processing resource for the first thread; and allocate registers of the selected processing resource for the first thread.

    Computing efficient cross channel operations in parallel computing machines using systolic arrays

    公开(公告)号:US11669490B2

    公开(公告)日:2023-06-06

    申请号:US17518202

    申请日:2021-11-03

    CPC classification number: G06F15/8046 G06F15/8007 G06F17/16 G06N20/00

    Abstract: An apparatus to facilitate computing efficient cross channel operations in parallel computing machines using systolic arrays is disclosed. The apparatus includes a plurality of registers and one or more processing elements communicably coupled to the plurality of registers. The one or more processing elements include a systolic array circuit to perform cross-channel operations on source data received from a single source register of the plurality of registers, wherein the systolic array circuit is modified to: receive inputs from the single source register at different stages of the systolic array circuit; perform cross-channel operations at channels of the systolic array circuit; bypass disabled channels of the systolic array circuit, the disabled channels not used to compute the cross-channel operations; and broadcast a final result of a final stage of the systolic array circuit to all channels of a destination register.

Patent Agency Ranking