Abstract:
A method includes, in a processor, processing program code that includes memory-access instructions, wherein at least some of the memory-access instructions include symbolic expressions that specify memory addresses in an external memory in terms of one or more register names. A relationship between the memory addresses accessed by two or more of the memory-access instructions is identified, based on respective formats of the memory addresses specified in the symbolic expressions. An outcome of at least one of the memory-access instructions is assigned to be served from an internal memory in the processor, based on the identified relationship.
Abstract:
A method includes, in a processor that processes instructions of program code, processing a first segment of the instructions. One or more destination registers are identified in the first segment using an approximate specification of register access by the instructions. Respective values of the destination registers are made available to a second segment of the instructions only upon verifying that the values are valid for readout by the second segment in accordance with the approximate specification. The second segment is processed at least partially in parallel with processing of the first segment, using the values made available from the first segment.
Abstract:
A method includes, in a processor that executes instructions of program code, identifying a region of the code containing one or more segments of the instructions that are at least partially repetitive. The instructions in the region are monitored, and an approximate specification of register access by the monitored instructions is constructed for the region. Execution of the segments in the region is parallelized using the specification.
Abstract:
A method which includes, in a processor that processes instructions of program code, processing one or more of the instructions in a first segment of the instructions by a first hardware thread. Upon detecting that an instruction defined as a parallelization point has been fetched for the first thread, a second hardware thread is invoked to process at least one of the instructions in a second segment of the instructions, at least partially in parallel with processing of the instructions of the first segment by the first hardware thread, in accordance with a specification of register access that is indicative of data dependencies between the first and second segments.
Abstract:
A method includes, in a processor, processing program code that includes memory-access instructions, wherein at least some of the memory-access instructions include symbolic expressions that specify memory addresses in an external memory in terms of one or more register names. A relationship between the memory addresses accessed by two or more of the memory-access instructions is identified, based on respective formats of the memory addresses specified in the symbolic expressions. An outcome of at least one of the memory-access instructions is assigned to be served from an internal memory in the processor, based on the identified relationship.
Abstract:
A processor includes a processing pipeline including multiple hardware threads and configured to execute software code instructions that are stored in a memory, along with multiple registers, configured to be read and written to by the processing pipeline during execution of the instructions. A monitoring unit monitors the instructions in the processing pipeline and records respective monitoring tables indicating the registers accessed in processing the instructions in different sequences of the instructions, and parallelizes among the hardware threads of the processor, using the respective monitoring tables, execution of repetitions of at least first sequences of the instructions. The monitoring unit is configured to evaluate a termination criterion based on the monitored instructions while monitoring the processing and recording the respective monitoring tables, and upon meeting the termination criterion, to terminate the monitoring before completion of the recording of the respective monitoring tables for at least second sequences of the instructions.
Abstract:
Method(s) for up/down fusion and/or pseudo-fusion of micro-operations are performed in a hardware processor configured to execute program code. A mergeable pair of micro-operations is identified in a sequence of micro-operations of the program code. The pair of micro-operations includes a first micro-operation for performing a first function and a non-consecutive second micro-operation for performing a second function. The first micro-operation precedes the second micro-operation in the sequence of micro-operations being processed. The first micro-operation is merged into the second micro-operation to create a third micro-operation which performs both the first function and the second function. In up/down fusion the third micro-operation is dispatched instead of the first micro-operation or instead of the second micro-operation, based on whether fuse-up or fuse-down is performed. In pseudo-fusion the first micro-operation is retained in the sequence of micro-operations and the second micro-operation is replaced with the third micro-operation.
Abstract:
A processor includes a pipeline and control circuitry. The pipeline is configured to process instructions of program code and includes one or more fetch units. The control circuitry is configured to predict at run-time one or more future flow-control traces to be traversed in the program code, to define, based on the predicted flow-control traces, two or more regions of the program code from which instructions are to be fetched, wherein the number of regions is greater than the number of fetch units, and to instruct the pipeline to fetch instructions alternately from the two or more regions of the program code using the one or more fetch units, and to process the fetched instructions.
Abstract:
A method includes, in a processor, processing program code that includes memory-access instructions, wherein at least some of the memory-access instructions include symbolic expressions that specify memory addresses in an external memory in terms of one or more register names. Based on respective formats of the memory addresses specified in the symbolic expressions, a sequence of load instructions that access a predictable pattern of memory addresses in the external memory is identified. At least one cache line that includes a plurality of data values is retrieved from the external memory. Based on the predictable pattern, two or more of the data values that are requested by respective load instructions in the sequence are saved from the cache line to the internal memory. The saved data values are assigned to be served from the internal memory to one or more instructions that depend on the respective load instructions.
Abstract:
A method includes, in a processor, processing program code that includes memory-access instructions, wherein at least some of the memory-access instructions include symbolic expressions that specify memory addresses in an external memory in terms of one or more register names. At least a store instruction and a subsequent load instruction that access the same memory address in the external memory are identified, based on respective formats of the memory addresses specified in the symbolic expressions. An outcome of at least one of the memory-access instructions is assigned to be served to one or more instructions that depend on the load instruction, from an internal memory in the processor.