Methods, systems, and apparatuses to optimize partial flag updating instructions via dynamic two-pass execution in a processor

    公开(公告)号:US12039329B2

    公开(公告)日:2024-07-16

    申请号:US17134108

    申请日:2020-12-24

    CPC classification number: G06F9/223 G06F9/30145

    Abstract: Systems, methods, and apparatuses relating to circuitry to implement dynamic two-pass execution of a partial flag updating instruction in a processor are described. In one embodiment, a hardware processor core includes a decoder circuit to decode instructions into a set of one or more micro-operations, an execution circuit to execute the micro-operations decoded for the instructions, a data register to store data, a flag register to store a plurality of flags, and a reservation station circuit coupled between the decoder circuit and the execution circuit, the reservation station circuit to, in response to an indicator bit set to a multiple pass mode for a single micro-operation in a reservation station entry, perform a first dispatch of the single micro-operation to the execution circuit, when a source data operand in the data register is ready for execution and a source flag operand in the flag register is not ready for execution, to generate a data resultant, and a second dispatch of the single micro-operation to the execution circuit when both the source data operand in the data register and the source flag operand in the flag register are ready for execution to generate a flag resultant based on one or more of the plurality of flags in the flag register.

    Methods, systems, and apparatuses for out-of-order access to a shared microcode sequencer by a clustered decode pipeline

    公开(公告)号:US11907712B2

    公开(公告)日:2024-02-20

    申请号:US17033649

    申请日:2020-09-25

    CPC classification number: G06F9/223 G06F9/382 G06F9/3802 G06F9/3822 G06F9/3844

    Abstract: Systems, methods, and apparatuses relating to circuitry to implement out-of-order access to a shared microcode sequencer by a clustered decode pipeline are described. In one embodiment, a hardware processor core includes a first decode cluster comprising a plurality of decoder circuits, a second decode cluster comprising a plurality of decoder circuits, a fetch circuit to fetch a first block of instructions and send the first block of instructions to the first decode cluster for decoding, and fetch a second block of instructions younger in program order than the first block of instructions and send the second block of instructions to the second decode cluster for decoding, a microcode sequencer comprising a memory that stores a plurality of micro-operations, and an arbitration circuit to arbitrate access by the first decode cluster and the second decode cluster to a shared read port of the memory, wherein the arbitration circuit is to allow the second decode cluster decoding the second block of instructions access to the shared read port of the memory instead of the first decode cluster decoding the first block of instructions when an instruction of the second block of instructions has a number of corresponding micro-operations in the microcode sequencer below an arbitration threshold.

    Apparatuses, methods, and systems for memory disambiguation

    公开(公告)号:US10067762B2

    公开(公告)日:2018-09-04

    申请号:US15201218

    申请日:2016-07-01

    Abstract: Apparatuses, methods, and systems relating to memory disambiguation are described. In one embodiment, a processor includes a decoder to decode an instruction into a decoded instruction, an execution unit to execute the decoded instruction, a retirement unit to retire an executed instruction in program order, and a memory disambiguation circuit to allocate an entry in a memory disambiguation table for a first load instruction that is to be flushed for a memory ordering violation, the entry comprising a counter value and an instruction pointer for the first load instruction.

    Methods, systems, and apparatuses for scalable port-binding for asymmetric execution ports and allocation widths of a processor

    公开(公告)号:US12190157B2

    公开(公告)日:2025-01-07

    申请号:US17033739

    申请日:2020-09-26

    Abstract: Systems, methods, and apparatuses relating to circuitry to implement scalable port-binding for asymmetric execution ports and allocation widths of a processor are described. In one embodiment, a hardware processor core includes a decoder circuit to decode instructions into sets of one or more micro-operations, an instruction decode queue to store the sets of one or more micro-operations, a plurality of different types of execution circuits that each comprise a respective input port and a respective input queue, and an allocation circuit comprising a plurality of allocation lanes coupled to the instruction decode queue and to the input ports of the plurality of different types of execution circuits, wherein the allocation circuit is to, for an input of micro-operations on the plurality of allocation lanes, generate a sorted list of occupancy of the input queues of each input port, generate a pre-binding mapping of the input ports of the plurality of different types of execution circuits to the plurality of allocation lanes in a circular order according to the sorted list, when a type of micro-operation from an allocation lane does not match a type of execution circuit of an input port in the pre-binding mapping, slide the pre-binding mapping so that the input port maps to a next allocation lane having a matching type of micro-operation to generate a final mapping of the input ports of the plurality of different types of execution circuits to the plurality of allocation lanes, and bind the input ports of the plurality of different types of execution circuits to the plurality of allocation lanes according to the final mapping.

Patent Agency Ranking