Abstract:
A method which includes, in a processor that processes instructions of program code, processing one or more of the instructions by a first hardware thread. Upon detecting that an instruction defined as a parallelization point has been fetched for the first thread, a second hardware thread is invoked to process at least one of the instructions at least partially in parallel with processing of the instructions by the first hardware thread.
Abstract:
A method includes, in a processor, processing a sequence of pre-compiled instructions by an instruction pipeline of the processor. A first block of instructions is identified in the instructions flowing via the pipeline. The first block includes a conditional branch instruction that conditionally diverges execution of the instructions into at least first and second flow-control traces that differ from one another in multiple instructions and converge at a given instruction that is again common to the first and second flow-control traces. A second block of instructions, which is logically equivalent to the first block but replaces the first and second flow-control traces by a single flow-control trace, is created by the processor at runtime. The pipeline is caused to execute the second block instead of the first block.
Abstract:
A method includes retrieving to a pipeline of a processor first instructions of program code from a first region in the program code. Before fully determining a flow-control path, which is to be traversed within the first region until exit from the first region, a beginning of a second region in the code that is to be processed following the first region is predicted, and second instructions begin to be retrieved to the pipeline from the second region. The retrieved first instructions and second instructions are processed by the pipeline.
Abstract:
A processor includes an instruction pipeline and control circuitry. The instruction pipeline is configured to process instructions of program code. The control circuitry is configured to monitor the processed instructions at run-time, to construct an invocation data structure comprising multiple entries, wherein each entry (i) specifies an initial instruction that is a target of a branch instruction, (ii) specifies a portion of the program code that follows one or more possible flow-control traces beginning from the initial instruction, and (iii) specifies, for each possible flow-control trace specified in the entry, a next entry that is to be processed following processing of that possible flow-control trace, and to configure the instruction pipeline to process segments of the program code, by continually traversing the entries of the invocation data structure.
Abstract:
A method includes, in a processor that processes instructions of program code, processing a first segment of the instructions. One or more destination registers are identified in the first segment using an approximate specification of register access by the instructions. Respective values of the destination registers are made available to a second segment of the instructions only upon verifying that the values are valid for readout by the second segment in accordance with the approximate specification. The second segment is processed at least partially in parallel with processing of the first segment, using the values made available from the first segment.
Abstract:
A method includes, in a processor that processes instructions of program code, processing one or more of the instructions by a first hardware thread. Upon detecting that an instruction defined as a parallelization point has been fetched for the first thread, a second hardware thread is invoked to process at least one of the instructions at least partially in parallel with processing of the instructions by the first hardware thread.
Abstract:
A method includes, in a processor that executes instructions of program code, monitoring the instructions in a segment of a repetitive sequence of the instructions so as to construct a specification of register access by the monitored instructions. In response to detecting a branch mis-prediction in the monitored instructions, the specification is corrected so as to compensate for the branch mis-prediction. Execution of the repetitive sequence is parallelized based on the corrected specification.