METHODS, APPARATUS, INSTRUCTIONS AND LOGIC TO PROVIDE VECTOR PACKED TUPLE CROSS-COMPARISON FUNCTIONALITY
    51.
    发明申请
    METHODS, APPARATUS, INSTRUCTIONS AND LOGIC TO PROVIDE VECTOR PACKED TUPLE CROSS-COMPARISON FUNCTIONALITY 审中-公开
    方法,装置,说明和逻辑提供向量包装的十字形跨比较功能

    公开(公告)号:US20160188336A1

    公开(公告)日:2016-06-30

    申请号:US14588247

    申请日:2014-12-31

    CPC classification number: G06F9/30036 G06F9/30018 G06F9/30021 G06F9/3834

    Abstract: Instructions and logic provide SIMD vector packed tuple cross-comparison functionality. Some processor embodiments include first and second registers with a variable plurality of data fields, each of the data fields to store an element of a first data type. The processor executes a SIMD instruction for vector packed tuple cross-comparison in some embodiments, which for each data field of a portion of data fields in a tuple of the first register, compares its corresponding element with every element of a corresponding portion of data fields in a tuple of the second register and sets a mask bit corresponding to each element of the second register portion, in a bit-mask corresponding to each unmasked element of the corresponding first register portion, according to the corresponding comparison. In some embodiments bit-masks are shifted by corresponding elements in data fields of a third register. The comparison type is indicated by an immediate operand.

    Abstract translation: 指令和逻辑提供SIMD向量填充元组交叉比较功能。 一些处理器实施例包括具有可变多个数据字段的第一和第二寄存器,每个数据字段用于存储第一数据类型的元素。 在一些实施例中,处理器执行用于向量填充元组交叉比较的SIMD指令,对于第一寄存器的元组中的数据字段的一部分的每个数据字段,将其相应元素与数据字段的相应部分的每个元素进行比较 在第二寄存器的元组中,根据相应的比较,在对应于相应的第一寄存器部分的每个未屏蔽元素的位掩码中设置对应于第二寄存器部分的每个元素的掩码位。 在一些实施例中,位掩码由第三寄存器的数据字段中的相应元素移位。 比较类型由即时操作数指示。

    INSTRUCTIONS TO CONVERT FROM FP16 TO FP8
    56.
    发明公开

    公开(公告)号:US20240045684A1

    公开(公告)日:2024-02-08

    申请号:US17958380

    申请日:2022-10-01

    CPC classification number: G06F9/30145 G06F9/30036 G06F9/30018

    Abstract: Techniques for converting FP16 to BF8 using bias are described. An example embodiment utilizes decoder circuitry to decode a single instruction, the single instruction to include one or more fields to identify a first source operand, one or more fields to identify a second source operand, one or more fields to identify a source/destination operand, and one or more fields for an opcode, wherein the opcode is to indicate that execution circuitry is to convert packed half-precision data from the identified first and second sources to packed FP8 data using bias terms from the identified source/destination operand and store the packed FP8 data into corresponding data element positions of the identified source/destination operand; and execution circuitry to execute the decoded instruction according to the opcode to convert packed half-precision data from the identified first and second sources to packed FP8 data using bias terms from the identified source/destination operand and store the packed FP8 data into corresponding data element positions of the identified source/destination operand.

    APPARATUSES, METHODS, AND SYSTEMS FOR HASHING INSTRUCTIONS

    公开(公告)号:US20240036865A1

    公开(公告)日:2024-02-01

    申请号:US18336985

    申请日:2023-06-17

    Abstract: Systems, methods, and apparatuses relating to performing hashing operations on packed data elements are described. In one embodiment, a processor includes a decode circuit to decode a single instruction into a decoded single instruction, the single instruction including at least one first field that identifies eight 32-bit state elements A, B, C, D, E, F, G, and H for a round according to a SM3 hashing standard and at least one second field that identifies an input message; and an execution circuit to execute the decoded single instruction to: rotate state element C left by 9 bits to form a rotated state element C, rotate state element D left by 9 bits to form a rotated state element D, rotate state element G left by 19 bits to form a rotated state element G, rotate state element H left by 19 bits to form a rotated state element H, perform two rounds according to the SM3 hashing standard on the input message and state element A, state element B, rotated state element C, rotated state element D, state element E, state element F, rotated state element G, and rotated state element H to generate an updated state element A, an updated state element B, an updated state element E, and an updated state element F, and store the updated state element A, the updated state element B, the updated state element E, and the updated state element F into a location specified by the single instruction.

    Systems and methods for performing 16-bit floating-point matrix dot product instructions

    公开(公告)号:US11614936B2

    公开(公告)日:2023-03-28

    申请号:US17216566

    申请日:2021-03-29

    Abstract: Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (m, n) of the specified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the specified first source matrix by a corresponding nibble of a doubleword element (K,N) of the specified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element.

    APPARATUSES, METHODS, AND SYSTEMS FOR HASHING INSTRUCTIONS

    公开(公告)号:US20220147356A1

    公开(公告)日:2022-05-12

    申请号:US17537373

    申请日:2021-11-29

    Abstract: Systems, methods, and apparatuses relating to performing hashing operations on packed data elements are described. In one embodiment, a processor includes a decode circuit to decode a single instruction into a decoded single instruction, the single instruction including at least one first field that identifies eight 32-bit state elements A, B, C, D, E, F, G, and H for a round according to a SM3 hashing standard and at least one second field that identifies an input message; and an execution circuit to execute the decoded single instruction to: rotate state element C left by 9 bits to form a rotated state element C, rotate state element D left by 9 bits to form a rotated state element D, rotate state element G left by 19 bits to form a rotated state element G, rotate state element H left by 19 bits to form a rotated state element H, perform two rounds according to the SM3 hashing standard on the input message and state element A, state element B, rotated state element C, rotated state element D, state element E, state element F, rotated state element G, and rotated state element H to generate an updated state element A, an updated state element B, an updated state element E, and an updated state element F, and store the updated state element A, the updated state element B, the updated state element E, and the updated state element F into a location specified by the single instruction.

Patent Agency Ranking