Abstract:
Embodiments described herein generally relate to the field of multi-strand out-of-order loop processing, and, more specifically, to apparatus and methods to support counted loop exits in a multi-strand loop processor. In one embodiment, a processor includes a loop accelerator comprising a strand documentation buffer and a plurality of strand execution circuits; and a binary translator to receive a plurality of loop instructions, divide the plurality of loop instructions into a plurality of strands, and store a strand documentation for each of the plurality of strands into the strand documentation buffer, each strand documentation indicating at least a number of iterations; wherein the binary translator further causes the loop accelerator to execute the plurality of strands asynchronously and in parallel using the plurality of strand execution circuits, wherein each of the strand execution circuits repeats the strand for the number of iterations indicated in the strand documentation associated with the strand.
Abstract:
An apparatus and method are described for executing instructions using a predicate register. For example, one embodiment of a processor comprises: a register set including a predicate register to store a set of predicate condition bits, the predicate condition bits specifying whether results of a particular predicated instruction sequence are to be retained or discarded; and predicate execution logic to execute a first predicate instruction to indicate a start of a new predicated instruction sequence by copying a condition value from a processor control register in the register set to the predicate register. In a further embodiment, the predicate condition bits in the predicate register are to be shifted in response to the first predicate instruction to free space within the predicate register for the new condition value associated with the new predicated instruction sequence.
Abstract:
A processor includes a first logic to execute an instruction stream out-of-order, the instruction stream divided into a plurality of strands, the instruction stream and each strand ordered by program order (PO). The processor also includes a second logic to determine an oldest undispatched instruction in the instruction stream and store an associated PO value of the oldest undispatched instruction as an executed instruction pointer. The instruction stream includes dispatched and undispatched instructions. The processor also includes a third logic to determine a most recently retired instruction in the instruction stream and store an associated PO value of the most recently retired instruction as a retirement pointer, a fourth logic to select a range of instructions between the retirement pointer and the executed instruction pointer, and a fifth logic to identify the range of instructions as eligible for retirement.
Abstract:
A processor includes a first logic to execute an instruction stream out-of-order, the instruction stream divided into a plurality of strands, the instruction stream and each strand ordered by program order (PO). The processor also includes a second logic to determine an oldest undispatched instruction in the instruction stream and store an associated PO value of the oldest undispatched instruction as an executed instruction pointer. The instruction stream includes dispatched and undispatched instructions. The processor also includes a third logic to determine a most recently retired instruction in the instruction stream and store an associated PO value of the most recently retired instruction as a retirement pointer, a fourth logic to select a range of instructions between the retirement pointer and the executed instruction pointer, and a fifth logic to identify the range of instructions as eligible for retirement.
Abstract:
Embodiments described herein generally relate to the field of multi-strand out-of-order loop processing, and, more specifically, to apparatus and methods to support counted loop exits in a multi-strand loop processor. In one embodiment, a processor includes a loop accelerator comprising a strand documentation buffer and a plurality of strand execution circuits; and a binary translator to receive a plurality of loop instructions, divide the plurality of loop instructions into a plurality of strands, and store a strand documentation for each of the plurality of strands into the strand documentation buffer, each strand documentation indicating at least a number of iterations; wherein the binary translator further causes the loop accelerator to execute the plurality of strands asynchronously and in parallel using the plurality of strand execution circuits, wherein each of the strand execution circuits repeats the strand for the number of iterations indicated in the strand documentation associated with the strand.
Abstract:
A processor includes logic to execute an instruction stream out-of-order. The instruction stream is divided into a plurality of strands and its instructions and those within the streams are ordered by program order (PO). The processor further includes logic to identify an oldest undispatched instruction in the instruction stream and record its associated PO as an executed instruction pointer, identify a most recently committed store instruction in the instruction stream and record its associated PO as a store commitment pointer, a search pointer with PO less than the execution instruction pointer, identify a first set of store instructions in a store buffer with PO less than the search pointer and eligible for commitment, evaluate whether the first set of store instructions is larger than a number of read ports of the store buffer, and adjust the search pointer.
Abstract:
An apparatus includes a register file and a binary translator to create a plurality of strands and a plurality of iteration windows, where each iteration window of the plurality of iteration windows is allocated a set of continuous registers of the register file. The apparatus further includes a buffer to store strand documentation for a strand from the plurality of strands, where the strand documentation for the strand is to include an indication of a current register base for the strand. The apparatus further includes an execution circuit to execute an instruction to update the current register base for the strand in the strand documentation for the strand based on a fixed step value and an iteration window size.
Abstract:
In accordance with embodiments disclosed herein, there are provided methods, systems, and apparatuses for scheduling instructions in a multi-strand out-of-order processor. For example, an apparatus for scheduling instructions in a multi-strand out-of-order processor includes an out-of-order instruction fetch unit to retrieve a plurality of interdependent instructions for execution from a multi-strand representation of a sequential program listing; an instruction scheduling unit to schedule the execution of the plurality of interdependent instructions based at least in part on operand synchronization bits encoded within each of the plurality of interdependent instructions; and a plurality of execution units to execute at least a subset of the plurality of interdependent instructions in parallel.
Abstract:
Methods and apparatuses to control access to a multiple bank data cache are described. In one embodiment, a processor includes conflict resolution logic to detect multiple instructions scheduled to access a same bank of a multiple bank data cache in a same clock cycle and to grant access priority to an instruction of the multiple instructions scheduled to access a highest total of banks of the multiple bank data cache. In another embodiment, a method includes detecting multiple instructions scheduled to access a same bank of a multiple bank data cache in a same clock cycle, and granting access priority to an instruction of the multiple instructions scheduled to access a highest total of banks of the multiple bank data cache.
Abstract:
A processor includes logic to execute an instruction stream out-of-order. The instruction stream is divided into a plurality of strands and its instructions and those within the streams are ordered by program order (PO). The processor further includes logic to identify an oldest undispatched instruction in the instruction stream and record its associated PO as an executed instruction pointer, identify a most recently committed store instruction in the instruction stream and record its associated PO as a store commitment pointer, a search pointer with PO less than the execution instruction pointer, identify a first set of store instructions in a store buffer with PO less than the search pointer and eligible for commitment, evaluate whether the first set of store instructions is larger than a number of read ports of the store buffer, and adjust the search pointer.