Abstract:
A method includes, in a processor, processing a sequence of pre-compiled instructions by an instruction pipeline of the processor. A first block of instructions is identified in the instructions flowing via the pipeline. The first block includes at least first and second conditional branch instructions that conditionally diverge execution of the instructions into a plurality of flow-control traces that differ from one another in multiple instructions and converge at a given instruction. A second block of instructions, which is logically equivalent to the first block but replaces the plurality of flow-control traces by a reduced set of one or more flow-control traces, having fewer flow-control traces than the first block, is created by the processor at runtime. The pipeline is caused to execute the second block instead of the first block.
Abstract:
A method includes, in a processor that processes instructions of program code, processing one or more of the instructions in a first segment of the instructions by a first hardware thread. Upon detecting that an instruction defined as a parallelization point has been fetched for the first thread, a second hardware thread is invoked to process at least one of the instructions in a second segment of the instructions, at least partially in parallel with processing of the instructions of the first segment by the first hardware thread, in accordance with a specification of register access that is indicative of data dependencies between the first and second segments.
Abstract:
A method includes, in a processor having a pipeline, fetching instructions of program code at run-time, in an order that is different from an order-of-appearance of the instructions in the program code. The instructions are divided into segments having segment identifiers (IDs). An event, which warrants flushing of instructions starting from an instruction belonging to a segment, is detected. In response to the event, at least some of the instructions in the segment that are subsequent to the instruction, and at least some of the instructions in one or more subsequent segments that are subsequent to the segment, are flushed from the pipeline based on the segment IDs.
Abstract:
A method includes, in a processor, processing a sequence of pre-compiled instructions by an instruction pipeline of the processor. A first block of instructions is identified in the instructions flowing via the pipeline. The first block includes a conditional branch instruction that conditionally diverges execution of the instructions into at least first and second flow-control traces that differ from one another in multiple instructions and converge at a given instruction that is again common to the first and second flow-control traces. A second block of instructions, which is logically equivalent to the first block but replaces the first and second flow-control traces by a single flow-control trace, is created by the processor at runtime. The pipeline is caused to execute the second block instead of the first block.
Abstract:
A method includes, in a processor, processing program code that includes memory-access instructions, wherein at least some of the memory-access instructions include symbolic expressions that specify memory addresses in an external memory in terms of one or more register names. At least first and second load instructions that access a same memory address in the external memory are identified in the program code, based on respective formats of the memory addresses specified in the symbolic expressions of the load instructions. An outcome of at least one of the load instructions is assigned to be served from an internal memory in the processor.
Abstract:
A processor includes a processing pipeline including multiple hardware threads and configured to execute software code instructions that are stored in a memory, along with multiple registers, configured to be read and written to by the processing pipeline during execution of the instructions. A monitoring unit monitors the instructions in the processing pipeline and records respective monitoring tables indicating the registers accessed in processing the instructions in different sequences of the instructions, and parallelizes among the hardware threads of the processor, using the respective monitoring tables, execution of repetitions of at least first sequences of the instructions. The monitoring unit is configured to evaluate a termination criterion based on the monitored instructions while monitoring the processing and recording the respective monitoring tables, and upon meeting the termination criterion, to terminate the monitoring before completion of the recording of the respective monitoring tables for at least second sequences of the instructions.
Abstract:
A method includes, in a processor that executes instructions of program code, monitoring instructions of a repetitive sequence of the instructions that traverses a flow-control trace so as to construct a specification of register access by the monitored instructions. Based on the specification, multiple hardware threads are invoked to execute respective segments of the repetitive instruction sequence at least partially in parallel. Monitoring of the instructions continues in at least one of the segments during execution.
Abstract:
A method includes, in a pipeline of a processor, writing instructions of a single software thread that are pending for execution into a reorder buffer (ROB) in accordance with a single write position, and incrementing the single write position to point to a location in the ROB for a next instruction to be written. The instructions, which were written in accordance with the single write position, are removed from first and second different locations in the ROB, and the first and second locations are incremented.
Abstract:
A processor includes an execution pipeline and monitoring circuity. The execution pipeline is configured to execute instructions of program code. The monitoring circuity is configured to monitor the instructions in a segment of a repetitive sequence of the instructions so as to construct a specification of register access by the monitored instructions, to parallelize execution of the repetitive sequence based on the corrected specification, and to terminate monitoring of the instructions and discard the specification in response to detecting a branch mis-prediction in the monitored instructions.
Abstract:
A method includes, in a processor, processing program code that includes memory-access instructions, wherein at least some of the memory-access instructions include symbolic expressions that specify memory addresses in an external memory in terms of one or more register names. Based on respective formats of the memory addresses specified in the symbolic expressions, a sequence of load instructions that access a predictable pattern of memory addresses in the external memory is identified. At least one cache line that includes a plurality of data values is retrieved from the external memory. Based on the predictable pattern, two or more of the data values that are requested by respective load instructions in the sequence are saved from the cache line to the internal memory. The saved data values are assigned to be served from the internal memory to one or more instructions that depend on the respective load instructions.