Dual sum of quadword 16×16 multiply and accumulate

    公开(公告)号:US12204903B2

    公开(公告)日:2025-01-21

    申请号:US17359522

    申请日:2021-06-26

    Abstract: Techniques for matrix multiplication are described. In some examples, a single instruction having a format of fields for an opcode, one or more fields to indicate a location of a source/destination operand, one or more fields to indicate a location of a first source operand, and one or more fields to indicate a location of a second source operand is used. Wherein the opcode is to indicate that execution circuitry is to: multiply values from corresponding data elements of the first and second sources, add a first subset of the multiplied values to a first value from the source/destination operand and store in a first data element position of the source/destination operand, and add a second subset of the multiplied values to a second value from the source/destination operand and store in a second data element position of the source/destination operand.

    Interruptible and restartable matrix multiplication instructions, processors, methods, and systems

    公开(公告)号:US12204898B2

    公开(公告)日:2025-01-21

    申请号:US18240287

    申请日:2023-08-30

    Abstract: A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.

    Memory-independent and scalable state component initialization for a processor

    公开(公告)号:US12112178B2

    公开(公告)日:2024-10-08

    申请号:US17134322

    申请日:2020-12-26

    CPC classification number: G06F9/44505 G06F9/30098 G06F9/30145

    Abstract: Systems or methods of the present disclosure may provide an initialization technique that enables the initialization of multiple states in an efficient manner. The initialization technique includes a register to track usage of state components of the processor and a decode unit to decode a state initialization instruction. The state initialization instruction indicates that of the state components are to be initialized. The initialization technique also includes an execution unit coupled with the decode unit. The execution unit, in response to the state initialization instruction, is to initialize the state components without reading another state component from memory as part of the initialization.

    INSTRUCTIONS TO CONVERT FROM FP16 TO FP8
    9.
    发明公开

    公开(公告)号:US20240045684A1

    公开(公告)日:2024-02-08

    申请号:US17958380

    申请日:2022-10-01

    CPC classification number: G06F9/30145 G06F9/30036 G06F9/30018

    Abstract: Techniques for converting FP16 to BF8 using bias are described. An example embodiment utilizes decoder circuitry to decode a single instruction, the single instruction to include one or more fields to identify a first source operand, one or more fields to identify a second source operand, one or more fields to identify a source/destination operand, and one or more fields for an opcode, wherein the opcode is to indicate that execution circuitry is to convert packed half-precision data from the identified first and second sources to packed FP8 data using bias terms from the identified source/destination operand and store the packed FP8 data into corresponding data element positions of the identified source/destination operand; and execution circuitry to execute the decoded instruction according to the opcode to convert packed half-precision data from the identified first and second sources to packed FP8 data using bias terms from the identified source/destination operand and store the packed FP8 data into corresponding data element positions of the identified source/destination operand.

Patent Agency Ranking