SYSTEMS, APPARATUSES, AND METHODS FOR MULTIPLICATION AND ACCUMULATION OF VECTOR PACKED SIGNED VALUES

    公开(公告)号:US20190102198A1

    公开(公告)日:2019-04-04

    申请号:US15721616

    申请日:2017-09-29

    Abstract: Embodiments of systems, apparatuses, and methods for multiplication and accumulation of signed data values in a processor are described. For example, execution circuitry executes a decoded instruction to multiply selected signed data values from a plurality of packed data element positions in first and second packed data source operands to generate a plurality of first signed result values, sum the plurality of first signed result values to generate one or more second signed result values, accumulate the one or more signed result values with one or more data values from a destination operand to generate one or more third signed result values, and store the one or more third signed result values in one or more packed data element positions in the destination operand.

    Apparatuses, methods, and systems for instructions for downconverting a tile row and interleaving with a register

    公开(公告)号:US12086595B2

    公开(公告)日:2024-09-10

    申请号:US17214853

    申请日:2021-03-27

    CPC classification number: G06F9/3016 G06F9/30025 G06F9/30098

    Abstract: Systems, methods, and apparatuses relating to interleaving data values. An embodiment includes decoding circuitry to decode a single instruction, the instruction having one or more fields to specify an opcode, one or more fields to specify a location of a first source operand, one or more fields to specify a location of a second source operand, one or more fields to specify a location of a destination operand, and one or more fields to specify an index value to be used to index a row in the first source operand, wherein the opcode is to indicate execution circuitry is to downconvert data elements of the indexed row of the first source operand, interleave the downconverted elements with data elements of the second source operand, and store the interleaved elements in the destination operand; and execution circuitry to execute the decoded instruction according to the opcode.

    INSTRUCTIONS TO CONVERT FROM FP16 TO FP8
    57.
    发明公开

    公开(公告)号:US20240045677A1

    公开(公告)日:2024-02-08

    申请号:US17958378

    申请日:2022-10-01

    CPC classification number: G06F9/30025 G06F9/3016

    Abstract: Techniques for converting FP16 or FP32 data elements to FP8 data elements using a single instruction are described. An exemplary apparatus includes decoder circuitry to decode a single instruction, the single instruction to include a one or more fields to identify a source operand, one or more fields to identify a destination operand, and one or more fields for an opcode, the opcode to indicate that execution circuitry is to convert packed half-precision floating-point data or single-precision floating point data from the identified source to packed FP8 data and store the packed bfloat8 data into corresponding data element positions of the identified destination operand; and execution circuitry to execute the decoded instruction according to the opcode to convert packed half-precision floating-point data or single-precision floating point data from the identified source to packed bfloat8 data and store the packed bfloat8 data into corresponding data element positions.

    Apparatus and method for performing dual signed and unsigned multiplication of packed data elements

    公开(公告)号:US11809867B2

    公开(公告)日:2023-11-07

    申请号:US17027230

    申请日:2020-09-21

    Abstract: An apparatus and method for performing dual concurrent multiplications of packed data elements. For example one embodiment of a processor comprises: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed byte data elements; a second source register to store a second plurality of packed byte data elements; execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to concurrently multiply each of the packed byte data elements of the first plurality with a corresponding packed byte data element of the second plurality to generate a plurality of products; adder circuitry to add specified sets of the products to generate temporary results for each set of products; zero-extension or sign-extension circuitry to zero-extend or sign-extend the temporary result for each set to generate an extended temporary result for each set; accumulation circuitry to combine each of the extended temporary results with a selected packed data value stored in a third source register to generate a plurality of final results; and a destination register to store the plurality of final results as a plurality of packed data elements in specified data element positions.

    Apparatus and method for performing dual signed and unsigned multiplication of packed data elements

    公开(公告)号:US11573799B2

    公开(公告)日:2023-02-07

    申请号:US17226986

    申请日:2021-04-09

    Abstract: An apparatus and method for performing dual concurrent multiplications of packed data elements. For example one embodiment of a processor comprises: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed doubleword data elements; a second source register to store a second plurality of packed doubleword data elements; and execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to multiply a first doubleword data element from the first source register with a second doubleword data element from the second source register to generate a first quadword product and to concurrently multiply a third doubleword data element from the first source register with a fourth doubleword data element from the second source register to generate a second quadword product; and a destination register to store the first quadword product and the second quadword product as first and second packed quadword data elements.

    Apparatus and method for performing dual signed and unsigned multiplication of packed data elements

    公开(公告)号:US10802826B2

    公开(公告)日:2020-10-13

    申请号:US15721412

    申请日:2017-09-29

    Abstract: An apparatus and method for performing dual concurrent multiplications of packed data elements. For example one embodiment of a processor comprises: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed byte data elements; a second source register to store a second plurality of packed byte data elements; execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to concurrently multiply each of the packed byte data elements of the first plurality with a corresponding packed byte data element of the second plurality to generate a plurality of products; adder circuitry to add specified sets of the products to generate temporary results for each set of products; zero-extension or sign-extension circuitry to zero-extend or sign-extend the temporary result for each set to generate an extended temporary result for each set; accumulation circuitry to combine each of the extended temporary results with a selected packed data value stored in a third source register to generate a plurality of final results; and a destination register to store the plurality of final results as a plurality of packed data elements in specified data element positions.

Patent Agency Ranking