Supporting 8-bit floating point format operands in a computing architecture

    公开(公告)号:US12242846B2

    公开(公告)日:2025-03-04

    申请号:US18618648

    申请日:2024-03-27

    Abstract: An apparatus to facilitate supporting 8-bit floating point format operands in a computing architecture is disclosed. The apparatus includes a processor comprising: a decoder to decode an instruction fetched for execution into a decoded instruction, wherein the decoded instruction is a matrix instruction that operates on 8-bit floating point operands to cause the processor to perform a parallel dot product operation; a controller to schedule the decoded instruction and provide input data for the 8-bit floating point operands in accordance with an 8-bit floating data format indicated by the decoded instruction; and systolic dot product circuitry to execute the decoded instruction using systolic layers, each systolic layer comprises one or more sets of interconnected multipliers, shifters, and adder, each set of multipliers, shifters, and adders to generate a dot product of the 8-bit floating point operands.

    COMPUTING EFFICIENT CROSS CHANNEL OPERATIONS IN PARALLEL COMPUTING MACHINES USING SYSTOLIC ARRAYS

    公开(公告)号:US20230367740A1

    公开(公告)日:2023-11-16

    申请号:US18310129

    申请日:2023-05-01

    CPC classification number: G06F15/8046 G06F15/8007 G06N20/00

    Abstract: An apparatus to facilitate computing efficient cross channel operations in parallel computing machines using systolic arrays is disclosed. The apparatus includes a plurality of registers and one or more processing elements communicably coupled to the plurality of registers. The one or more processing elements include a systolic array circuit to perform cross-channel operations on source data received from a single source register of the plurality of registers, wherein the systolic array circuit is modified to: receive inputs from the single source register at different stages of the systolic array circuit; perform cross-channel operations at channels of the systolic array circuit; bypass disabled channels of the systolic array circuit, the disabled channels not used to compute the cross-channel operations; and broadcast a final result of a final stage of the systolic array circuit to all channels of a destination register.

    FORWARD PROGRESS GUARANTEE USING SINGLE-LEVEL SYNCHRONIZATION AT INDIVIDUAL THREAD GRANULARITY

    公开(公告)号:US20230153176A1

    公开(公告)日:2023-05-18

    申请号:US17528386

    申请日:2021-11-17

    CPC classification number: G06F9/522 G06F9/48

    Abstract: An apparatus to facilitate facilitating forward progress guarantee using single-level synchronization at individual thread granularity is disclosed. The apparatus includes a processor comprising a barrier synchronization hardware circuitry to assign a set of global named barrier identifiers (IDs) to individual execution threads of a plurality of execution threads and synchronize execution of the individual execution threads on a single level via the set of global named barrier IDs; and a plurality of processing resources to execute the plurality of execution threads and comprising divergent barrier scheduling hardware circuitry to facilitate execution flow switching from a first divergent branch executed by a first thread to a second divergent branch executed by a second thread, the execution flow switching performed responsive to the first thread stalling to wait on a named barrier of the set of global named barrier IDs.

    FUSED INSTRUCTION TO ACCELERATE PERFORMANCE OF SECURE HASH ALGORITHM 2 (SHA-2) WORKLOADS IN A GRAPHICS ENVIRONMENT

    公开(公告)号:US20220416999A1

    公开(公告)日:2022-12-29

    申请号:US17358897

    申请日:2021-06-25

    Abstract: An apparatus to facilitate a fused instruction to accelerate performance of secure hash algorithm 2 (SHA-2) in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising execution circuitry to receive a fused SHA instruction identifying a length corresponding to a data size of the fused SHA instruction and a functional control identifying an operation type of the fused SHA instruction; based on decoding the fused SHA instruction, cause a sub-function identified by the length and the function control to be scheduled to an integer pipeline of the execution resource; and execute the sub-function of the fused SHA instruction in an integer pipeline of the execution circuitry, the sub-function to perform merged operations on a source operand of the fused SHA instruction, the merged operations comprising a rotate operation, a shift operation, and an xor operation.

Patent Agency Ranking