Abstract:
An instruction preload instruction executed in a first processor instruction set operating mode is operative to correctly preload instructions in a different, second instruction set. The instructions are pre-decoded according to the second instruction set encoding in response to an instruction set preload indicator (ISPI). In various embodiments, the ISPI may be set prior to executing the preload instruction, or may comprise part of the preload instruction or the preload target address.
Abstract:
Address translation performance within a processor is improved by identifying an address that causes a boundary crossing between different pages in memory and linking address translation information associated with both memory pages. According to one embodiment of a processor, the processor comprises circuitry configured to recognize an access to a memory region crossing a page boundary between first and second memory pages. The circuitry is also configured to link address translation information associated with the first and second memory pages. Thus, responsive to a subsequent access the same memory region, the address translation information associated with the first and second memory pages is retrievable based on a single address translation.
Abstract:
A method of processing branch history information is disclosed. The method retrieves branch instructions from an instruction cache and executes the branch instructions in a plurality of pipeline stages. The method verifies that a branch instruction has been identified. The method further receives branch history information during a first pipeline stage and loads the branch history information into a first register, wherein the first register. The method further loads the branch history information into the second register during the second pipeline stage.
Abstract:
A sliding-window, block-based branch target address cache (BTAC) comprises a plurality of entries, each entry associated with a block of instructions containing at least one branch instruction having been evaluated taken, and having a tag associated with the address of the first instruction in the block. The blocks each correspond to a group of instructions fetched from memory, such as an I-cache. Where a branch instruction is included in two or more fetch groups, it is also included in two or more instruction blocks associated with BTAC entries. The sliding-window, block-based BTAC allows for storing the branch target address (BTA) of two or more taken branch instructions that fall in the same instruction block, without providing for multiple BTA storage space in each BTAC entry, by storing BTAC entries associated with different instruction blocks, each containing at least one of the taken branch instructions.
Abstract:
A processor having a multistage pipeline includes a TLB and a TLB controller. In response to a TLB miss signal, the TLB controller initiates a TLB reload, requesting address translation information from either a memory or a higher-level TLB, and placing that information into the TLB. The processor flushes the instruction having the missing virtual address, and refetches the instruction, resulting in re-insertion of the instruction at an initial stage of the pipeline above the TLB access point. The initiation of the TLB reload, and the flush/refetch of the instruction, are performed substantially in parallel, and without immediately stalling the pipeline. The refetched instruction is held at a point in the pipeline above the TLB access point until the TLB reload is complete, so that the refetched instruction generates a "hit" in the TLB upon its next access.
Abstract:
A processor includes a memory configured to store data in a plurality of pages, a TLB, and a TLB controller. The TLB is configured to search, when accessed by an instruction having a virtual address, for address translation information that allows the virtual address to be translated into a physical address of one of the plurality of pages, and to provide the address translation information if the address translation information is found within the TLB. The TLB controller is configured to determine whether a current instruction and a subsequent instruction seek access to a same page within the plurality of pages, and if so, to prevent TLB access by the subsequent instruction, and to utilize the results of the TLB access of a previous instruction for the current instruction.
Abstract:
Hazard detection is simplified by converting a conditional instruction, operative to perform an operation if a condition is satisfied, into an emissary instruction operative to evaluate the condition and an unconditional base instruction operative to perform the operation. The emissary instruction is executed, while the base instruction is halted. The emissary instruction evaluates the condition and reports the condition evaluation back to the base instruction. Based on the condition evaluation, the base instruction is either launched into the pipeline for execution, or it is discarded (or a NOP, or null instruction, substituted for it). In either case, the dependencies of following instructions may be resolved.
Abstract:
Systems and methods are directed to prefetch mechanisms involving non-equal magnitude stride values. A non-equal magnitude functional relationship between successive stride values, may be detected, wherein the stride values are based on distances between target addresses of successive load instructions. At least a next stride value for prefetching data, may be determined, wherein the next stride value is based on the non-equal magnitude functional relationship and a previous stride value. Data prefetch may be from at least one prefetch address calculated based on the next stride value and a previous target address. The non-equal magnitude functional relationship may include a logarithmic relationship corresponding to a binary search algorithm.
Abstract:
A cache fill line is received, including an index, a thread identifier, and cache fill line data. The cache is probed, using the index and a different thread identifier, for a potential duplicate cache line. The potential duplicate cache line includes cache line data and the different thread identifier. Upon the cache fill line data matching the cache line data, duplication is identified. The potential duplicate cache line is set as a shared resident cache line, and the thread share permission tag is set to a permission state.
Abstract:
A memory structure compresses a portion of a memory tag using an indexed tag compression structure. A set of higher order bits of the memory tag may be stored in the indexed tag compression structure, where the set of higher order bits are identified by an index value. A tag array stores a set of lower order bits of the memory tag and the index value identifying the entry in the tag compression structure storing the set of higher order bits of the memory tag. The memory tag may comprise at least a portion of a memory address of a data element stored in a data array.