Abstract:
A method includes, in a processor that executes instructions of program code, monitoring instructions of a repetitive sequence of the instructions that traverses a flow-control trace so as to construct a specification of register access by the monitored instructions. Based on the specification, multiple hardware threads are invoked to execute respective segments of the repetitive instruction sequence at least partially in parallel. Monitoring of the instructions continues in at least one of the segments during execution.
Abstract:
A method which includes, in a processor that processes instructions of program code, processing one or more of the instructions by a first hardware thread. Upon detecting that an instruction defined as a parallelization point has been fetched for the first thread, a second hardware thread is invoked to process at least one of the instructions at least partially in parallel with processing of the instructions by the first hardware thread.
Abstract:
A method includes, in a processor, processing a sequence of pre-compiled instructions by an instruction pipeline of the processor. A first block of instructions is identified in the instructions flowing via the pipeline. The first block includes a conditional branch instruction that conditionally diverges execution of the instructions into at least first and second flow-control traces that differ from one another in multiple instructions and converge at a given instruction that is again common to the first and second flow-control traces. A second block of instructions, which is logically equivalent to the first block but replaces the first and second flow-control traces by a single flow-control trace, is created by the processor at runtime. The pipeline is caused to execute the second block instead of the first block.
Abstract:
A method includes, in a processor that processes instructions of program code, processing a first segment of the instructions. One or more destination registers are identified in the first segment using an approximate specification of register access by the instructions. Respective values of the destination registers are made available to a second segment of the instructions only upon verifying that the values are valid for readout by the second segment in accordance with the approximate specification. The second segment is processed at least partially in parallel with processing of the first segment, using the values made available from the first segment.
Abstract:
A method includes, in a processor that processes instructions of program code, processing one or more of the instructions by a first hardware thread. Upon detecting that an instruction defined as a parallelization point has been fetched for the first thread, a second hardware thread is invoked to process at least one of the instructions at least partially in parallel with processing of the instructions by the first hardware thread.
Abstract:
A method includes, in a processor that executes instructions of program code, monitoring the instructions in a segment of a repetitive sequence of the instructions so as to construct a specification of register access by the monitored instructions. In response to detecting a branch mis-prediction in the monitored instructions, the specification is corrected so as to compensate for the branch mis-prediction. Execution of the repetitive sequence is parallelized based on the corrected specification.
Abstract:
A processor includes a pipeline and a multi-bank Branch-Target Buffer (BTB). The pipeline is configured to process program instructions including branch instructions. The multi-bank BTB includes a plurality of BTB banks and is configured to store learned Target Addresses (TAs) of one or more of the branch instructions in the plurality of the BTB banks, to receive from the pipeline simultaneous requests to retrieve respective TAs, and to respond to the requests using the plurality of the BTB banks in the same clock cycle.
Abstract:
A processor includes a pipeline and a multi-bank Branch-Target Buffer (BTB). The pipeline is configured to process program instructions including branch instructions. The multi-bank BTB includes a plurality of BTB banks and is configured to store learned Target Addresses (TAs) of one or more of the branch instructions in the plurality of the BTB banks, to receive from the pipeline simultaneous requests to retrieve respective TAs, and to respond to the requests using the plurality of the BTB banks in the same clock cycle.
Abstract:
A method which includes, in a processor that processes instructions of program code, processing one or more of the instructions in a first segment of the instructions by a first hardware thread. Upon detecting that an instruction defined as a parallelization point has been fetched for the first thread, a second hardware thread is invoked to process at least one of the instructions in a second segment of the instructions, at least partially in parallel with processing of the instructions of the first segment by the first hardware thread, in accordance with a specification of register access that is indicative of data dependencies between the first and second segments.
Abstract:
A processor includes a processing pipeline including multiple hardware threads and configured to execute software code instructions that are stored in a memory, along with multiple registers, configured to be read and written to by the processing pipeline during execution of the instructions. A monitoring unit monitors the instructions in the processing pipeline and records respective monitoring tables indicating the registers accessed in processing the instructions in different sequences of the instructions, and parallelizes among the hardware threads of the processor, using the respective monitoring tables, execution of repetitions of at least first sequences of the instructions. The monitoring unit is configured to evaluate a termination criterion based on the monitored instructions while monitoring the processing and recording the respective monitoring tables, and upon meeting the termination criterion, to terminate the monitoring before completion of the recording of the respective monitoring tables for at least second sequences of the instructions.