ADAPTIVE DYNAMIC DISPATCH OF MICRO-OPERATIONS

    公开(公告)号:US20230205538A1

    公开(公告)日:2023-06-29

    申请号:US17561394

    申请日:2021-12-23

    CPC classification number: G06F9/3836 G06F9/30145 G06F9/3001

    Abstract: Embodiments of apparatuses, methods, and systems for adaptive dynamic dispatch of micro-operations are disclosed. In an embodiment, an apparatus includes a plurality of redundant execution units, a dispatcher, control hardware, a first counter, and a second counter. The dispatcher is to dispatch micro-operations to one or more of the plurality of redundant execution units, the micro-operations having a plurality of micro-operation types. The first counter to generate a first count of dispatches, during a window, of micro-operations having a first type of the plurality of micro-operation types. The second counter to generate a second count of dispatches, during the window, of micro-operations having any type of the plurality of micro-operation types. The control hardware is to cause a switch between a first mode and a second mode based in part on the first count and the second count. In the first mode, the dispatcher is to dispatch micro-operations having the first type to only a subset of the plurality of redundant execution units. In the second mode, the dispatcher is to dispatch micro-operations having the first type to all of the plurality of redundant execution units.

    SYSTEMS, APPARATUSES, AND METHODS FOR ADDITION OF PARTIAL PRODUCTS

    公开(公告)号:US20230048998A1

    公开(公告)日:2023-02-16

    申请号:US17964964

    申请日:2022-10-13

    Abstract: Embodiments of systems, apparatuses, and methods for fused multiple add. In some embodiments, a decoder decodes a single instruction having an opcode, a destination field representing a destination operand, and fields for a first, second, and third packed data source operand, wherein packed data elements of the first and second packed data source operand are of a first, different size than a second size of packed data elements of the third packed data operand. Execution circuitry then executes the decoded single instruction to perform, for each packed data element position of the destination operand, a multiplication of a M N-sized packed data elements from the first and second packed data sources that correspond to a packed data element position of the third packed data source, add of results from these multiplications to a full-sized packed data element of a packed data element position of the third packed data source, and storage of the addition result in a packed data element position destination corresponding to the packed data element position of the third packed data source, wherein M is equal to the full-sized packed data element divided by N.

    Systems and methods to perform floating-point addition with selected rounding

    公开(公告)号:US11175891B2

    公开(公告)日:2021-11-16

    申请号:US16370966

    申请日:2019-03-30

    Abstract: Disclosed embodiments relate to performing floating-point addition with selected rounding. In one example, a processor includes circuitry to decode and execute an instruction specifying locations of first and second floating-point (FP) sources, and an opcode indicating the processor is to: bring the FP sources into alignment by shifting a mantissa of the smaller source FP operand to the right by a difference between their exponents, generating rounding controls based on any bits that escape; simultaneously generate a sum of the FP sources and of the FP sources plus one, the sums having a fuzzy-Jbit format having an additional Jbit into which a carry-out, if any, select one of the sums based on the rounding controls, and generate a result comprising a mantissa-wide number of most-significant bits of the selected sum, starting with the most significant non-zero Jbit.

    Accelerator systems and methods for matrix operations

    公开(公告)号:US10942738B2

    公开(公告)日:2021-03-09

    申请号:US16368973

    申请日:2019-03-29

    Abstract: The present disclosure is directed to systems and methods for performing one or more operations on a two dimensional tile register using an accelerator that includes a tiled matrix multiplication unit (TMU). The processor circuitry includes reservation station (RS) circuitry to communicatively couple the processor circuitry to the TMU. The RS circuitry coordinates the operations performed by the TMU. TMU dispatch queue (TDQ) circuitry in the TMU maintains the operations received from the RS circuitry in the order that the operations are received from the RS circuitry. Since the duration of each operation is not known prior to execution by the TMU, the RS circuitry maintains shadow dispatch queue (RS-TDQ) circuitry that mirrors the operations in the TDQ circuitry. Communication between the RS circuitry 134 and the TMU provides the RS circuitry with notification of successfully executed operations and allows the RS circuitry to cancel operations where the operations are associated with branch mispredictions and/or non-retired speculatively executed instructions.

    APPARATUSES, METHODS, AND SYSTEMS FOR HASHING INSTRUCTIONS

    公开(公告)号:US20200310802A1

    公开(公告)日:2020-10-01

    申请号:US16370459

    申请日:2019-03-29

    Abstract: Systems, methods, and apparatuses relating to performing hashing operations on packed data elements are described. In one embodiment, a processor includes a decode circuit to decode a single instruction into a decoded single instruction, the single instruction including at least one first field that identifies eight 32-bit state elements A, B, C, D, E, F, G, and H for a round according to a SM3 hashing standard and at least one second field that identifies an input message; and an execution circuit to execute the decoded single instruction to: rotate state element C left by 9 bits to form a rotated state element C, rotate state element D left by 9 bits to form a rotated state element D, rotate state element G left by 19 bits to form a rotated state element G, rotate state element H left by 19 bits to form a rotated state element H, perform two rounds according to the SM3 hashing standard on the input message and state element A, state element B, rotated state element C, rotated state element D, state element E, state element F, rotated state element G, and rotated state element H to generate an updated state element A, an updated state element B, an updated state element E, and an updated state element F, and store the updated state element A, the updated state element B, the updated state element E, and the updated state element F into a location specified by the single instruction.

Patent Agency Ranking