Abstract:
Embodiments of apparatus and methods for detecting and recovering from incorrect memory dependence speculation in an out-of-order processor are described herein. For example, one embodiment of a method comprises: executing a first load instruction; detecting when the first load instruction experiences a bad store-to-load forwarding event during execution; tracking the occurrences of bad store-to-load forwarding event experienced by the first load instruction during execution; controlling enablement of an S-bit in the first load instruction based on the tracked occurrences; generating a plurality of load operations responsive to an enabled S-bit in first load instruction, wherein execution of the plurality of load operations produces a result equivalent to that from the execution of the first load instruction.
Abstract:
A method of operating a device with a processor that includes receiving control flow data, the control flow data including block identifiers for blocks of instructions, destination identifiers for one or more of the blocks of the instructions, and annotations for the blocks of instructions; the method further includes determining a destination identifier for a current instruction block based on the control flow data; identifying an annotation associated with the current instruction block based on the control flow data; and performing at least one of: modifying resources used by a processor; or tracking execution of the blocks of instructions based on one or more of the annotation or the destination identifier. Optimisation of a processing device may be performed according to the annotations by way of reducing the number of components used. Tracking of the execution of the blocks of instructions may be used to determine if a hard error in memory or a soft error in execution has occurred with reference to annotation indicating the number of instructions within in a current instruction block.
Abstract:
Ein Mechanismus für die Verfolgung des Kontrollflusses in einer Anwendung und die Durchführung von einer oder mehreren Optimierungen eines Verarbeitungsgeräts auf Basis des Kontrollflusses der Befehle in der Anwendung wird offenbart. Kontrollflussdaten werden erzeugt, um den Kontrollfluss der Befehlsblöcke in der Anwendung anzugeben. Die Kontrollflussdaten können Anmerkungen beinhalten, die angeben, ob Optimierungen für die verschiedenen Befehlsblöcke durchgeführt werden können. Die Kontrollflussdaten können auch verwendet werden, um die Ausführung der Befehle zu verfolgen, um zu bestimmen, ob ein Befehl in einem Befehlsblock einem Thread, einem Prozess und/oder einem Ausführungskern eines Prozessors zugewiesen ist, und um zu bestimmen, ob Fehler während der Ausführung der Befehle aufgetreten sind.
Abstract:
Eine Verarbeitungsvorrichtung, die umfasst: ein erstes Schattenregister, ein zweites Schattenregister und eine Befehlsausführungsschaltung, die kommunikationstechnisch mit dem ersten Schattenregister und dem zweiten Schattenregister gekoppelt ist und zu Folgendem ausgelegt ist: Empfangen einer Sequenz von Befehlen, die einen ersten lokalen Festschreibungsmerker, einen ersten globalen Festschreibungsmerker und einen ersten Registerzugriffsbefehl, der auf ein Architekturregister verweist, umfasst, spekulatives Ausführen des ersten Registerzugriffsbefehls, um einen spekulativen Registerzustandswert zu erzeugen, der einem physischen Register zugeordnet ist, als Antwort auf das Identifizieren der ersten lokalen Festschreibungsmerkers, Speichern des spekulativen Registerzustandswerts in dem ersten Schattenregister, und, als Antwort auf das Identifizieren des ersten globalen Festschreibungsmerkers, Speichern des spekulativen Registerzustandswerts in dem zweiten Schattenregister.
Abstract:
rastreamento de fluxo de controle de instruções - um mecanismo para rastrear o fluxo de controle de instruções em uma aplicação e realizar uma ou mais otimizações de um dispositivo de processamento, com base no fluxo de controle de instruções na aplicação, é revelado. os dados de fluxo de controle são gerados para indicar o fluxo de controle de blocos de instruções na aplicação. os dados de fluxo de controle podem incluir anotações que indicam se otimizações pode ser realizadas para diferentes blocos de instruções. os dados de fluxo de controle podem também ser usados para rastrear a execução das instruções para determinar se uma instrução em um bloco de instruções está atribuída a um thread, um processo, e/ou um núcleo de execução de um processador, e para determinar se os erros ocorreram durante a execução das instruções.
Abstract:
Embodiments of techniques and systems associated with binary translation (BT) in computing systems are disclosed. In some embodiments, a BT task to be processed may be identified. The BT task may be associated with a set of code and may be identified during execution of the set of code on a first processing core of the computing device. The BT task may be queued in a queue accessible to a second processing core of the computing device, the second processing core being different from the first processing core. In response to a determination that the second processing core is in an idle state or has received an instruction through an operating system to enter an idle state, at least some of the BT task may be processed using the second processing core. Other embodiments may be described and/or claimed.
Abstract:
Technologies for partial binary translation on multi-core platforms include a shared translation cache, a binary translation thread scheduler, a global installation thread, and a local translation thread and analysis thread for each processor core. On detection of a hotspot, the thread scheduler first resumes the global thread if suspended, next activates the global thread if a translation cache operation is pending, and last schedules local translation or analysis threads for execution. Translation cache operations are centralized in the global thread and decoupled from analysis and translation. The thread scheduler may execute in a non-preemptive nucleus, and the translation and analysis threads may execute in a preemptive runtime. The global thread may be primarily preemptive with a small non-preemptive nucleus to commit updates to the shared translation cache. The global thread may migrate to any of the processor cores. Forward progress is guaranteed. Other embodiments are described and claimed.