Abstract:
PROBLEM TO BE SOLVED: To resolve multithreading mis-speculation. SOLUTION: A number of multiple thread units concurrently execute threads. They execute a checkpoint mask instruction to initialize memory to store active checkpoint data including register contents and a checkpoint mask indicating the validity of stored register contents. As register contents change, threads execute checkpoint write instructions to store register contents and update the checkpoint mask. The threads also execute a recovery function instruction to store a pointer to a checkpoint recovery function, and in response to mis-speculation among the threads, branch to the checkpoint recovery function. The threads then execute one or more checkpoint read instructions to copy data from a valid checkpoint storage area into the registers necessary to recover a valid architectural state, from which execution may resume. COPYRIGHT: (C)2011,JPO&INPIT
Abstract:
In one embodiment, the present invention includes a method for receiving a bus message in a first cache corresponding to a speculative access to a portion of a second cache by a second thread, and dynamically determining in the first cache if an inter-thread dependency exists between the second thread and a first thread associated with the first cache with respect to the portion. Other embodiments are described and claimed.
Abstract:
Techniques are described for providing an enhanced cache coherency protocol for a multi-core processor that includes a Speculative Request For Ownership Without Data (SRFOWD) for a portion of cache memory. With a SRFOWD, only an acknowledgement message may be provided as an answer to a requesting core. The contents of the affected cache line are not required to be a part of the answer. The enhanced cache coherency protocol may assure that a valid copy of the current cache line exists in case of misspeculation by the requesting core. Thus, an owner of the current copy of the cache line may maintain a copy of the old contents of the cache line. The old contents of the cache line may be discarded if speculation by the requesting core turns out to be correct. Otherwise, in case of misspeculation by the requesting core, the old contents of the cache line may be set back to a valid state.
Abstract:
Systems, methods, and apparatuses for decomposing a sequential program into multiple threads, executing these threads, and reconstructing the sequential execution of the threads are described. A plurality of data cache units (DCUs) store locally retired instructions of speculatively executed threads. A merging level cache (MLC) merges data from the lines of the DCUs. An inter-core memory coherency module (ICMC) globally retire instructions of the speculatively executed threads in the MLC.
Abstract:
Ein Prozessor umfasst einen Prozessorkern und einen Berechnungsschaltkreis. Der Prozessorkern umfasst eine Logik zum Bestimmen eines Satzes von Gewichtungen zur Verwendung in einer Berechnung für ein faltendes neuronales Netzwerk (Convolutional Neural Network, CNN) und zum Hochskalieren der Gewichtungen mithilfe eines Skalierungswertes. Der Berechnungsschaltkreis umfasst eine Logik zum Empfangen des Skalierungswertes, des Satzes von Gewichtungen und eines Satzes von Eingangswerten, wobei jeder Eingangswert und die zugehörige Gewichtung die gleiche Größe aufweisen. Der Berechnungsschaltkreis umfasst auch eine Logik zum Ermitteln von Ergebnissen aus den Berechnungen für ein faltendes neuronales Netzwerk (CNN) auf der Grundlage des Satzes von Gewichtungen, der auf den Satz von Eingangswerten angewandt wurde, zum Herunterskalieren der Ergebnisse mithilfe des Skalierungswertes, zum Kürzen der herunterskalierten Werte auf die feste Größe und zum Verbinden der gekürzten Ergebnisse für einen Datenaustausch mit einer Ausgabe für eine Schicht des CNN.
Abstract:
In one embodiment, the present invention includes a method for accessing registers associated with a first thread while executing a second thread. In one such embodiment a method may include preventing an instruction of a first thread that is to access a source operand from a register file of a second thread from executing if a synchronization indicator associated with the source operand indicates incompletion of a producer operation of the second thread, and executing the instruction if the synchronization indicator indicates completion of the producer operation of the second thread. Other embodiments are described and claimed.
Abstract:
Disclosed is an apparatus and method to manage instruction cache prefetching from an instruction cache. A processor may comprise: a prefetch engine; a branch prediction engine to predict the outcome of a branch; and dynamic optimizer. The dynamic optimizer may be used to control: identifying common instruction cache misses and inserting a prefetch instruction from the prefetch engine to the instruction cache.