Abstract:
In one embodiment, the present invention includes a method for accessing registers associated with a first thread while executing a second thread. In one such embodiment a method may include preventing an instruction of a first thread that is to access a source operand from a register file of a second thread from executing if a synchronization indicator associated with the source operand indicates incompletion of a producer operation of the second thread, and executing the instruction if the synchronization indicator indicates completion of the producer operation of the second thread. Other embodiments are described and claimed.
Abstract:
Techniques are described for providing an enhanced cache coherency protocol for a multi-core processor that includes a Speculative Request For Ownership Without Data (SRFOWD) for a portion of cache memory. With a SRFOWD, only an acknowledgement message may be provided as an answer to a requesting core. The contents of the affected cache line are not required to be a part of the answer. The enhanced cache coherency protocol may assure that a valid copy of the current cache line exists in case of misspeculation by the requesting core. Thus, an owner of the current copy of the cache line may maintain a copy of the old contents of the cache line. The old contents of the cache line may be discarded if speculation by the requesting core turns out to be correct. Otherwise, in case of misspeculation by the requesting core, the old contents of the cache line may be set back to a valid state.
Abstract:
Systems, methods, and apparatuses for decomposing a sequential program into multiple threads, executing these threads, and reconstructing the sequential execution of the threads are described. A plurality of data cache units (DCUs) store locally retired instructions of speculatively executed threads. A merging level cache (MLC) merges data from the lines of the DCUs. An inter-core memory coherency module (ICMC) globally retire instructions of the speculatively executed threads in the MLC.
Abstract:
Disclosed is an apparatus and method to manage instruction cache prefetching from an instruction cache. A processor may comprise: a prefetch engine; a branch prediction engine to predict the outcome of a branch; and dynamic optimizer. The dynamic optimizer may be used to control: identifying common instruction cache misses and inserting a prefetch instruction from the prefetch engine to the instruction cache.