Abstract:
Systems, apparatuses, and methods for a hardware and software system to automatically decompose a program into multiple parallel threads are described. In some embodiments, the systems and apparatuses execute a method of original code decomposition and/or generated thread execution.
Abstract:
rastreamento de fluxo de controle de instruções - um mecanismo para rastrear o fluxo de controle de instruções em uma aplicação e realizar uma ou mais otimizações de um dispositivo de processamento, com base no fluxo de controle de instruções na aplicação, é revelado. os dados de fluxo de controle são gerados para indicar o fluxo de controle de blocos de instruções na aplicação. os dados de fluxo de controle podem incluir anotações que indicam se otimizações pode ser realizadas para diferentes blocos de instruções. os dados de fluxo de controle podem também ser usados para rastrear a execução das instruções para determinar se uma instrução em um bloco de instruções está atribuída a um thread, um processo, e/ou um núcleo de execução de um processador, e para determinar se os erros ocorreram durante a execução das instruções.
Abstract:
A method of operating a device with a processor that includes receiving control flow data, the control flow data including block identifiers for blocks of instructions, destination identifiers for one or more of the blocks of the instructions, and annotations for the blocks of instructions; the method further includes determining a destination identifier for a current instruction block based on the control flow data; identifying an annotation associated with the current instruction block based on the control flow data; and performing at least one of: modifying resources used by a processor; or tracking execution of the blocks of instructions based on one or more of the annotation or the destination identifier. Optimisation of a processing device may be performed according to the annotations by way of reducing the number of components used. Tracking of the execution of the blocks of instructions may be used to determine if a hard error in memory or a soft error in execution has occurred with reference to annotation indicating the number of instructions within in a current instruction block.
Abstract:
Ein Mechanismus für die Verfolgung des Kontrollflusses in einer Anwendung und die Durchführung von einer oder mehreren Optimierungen eines Verarbeitungsgeräts auf Basis des Kontrollflusses der Befehle in der Anwendung wird offenbart. Kontrollflussdaten werden erzeugt, um den Kontrollfluss der Befehlsblöcke in der Anwendung anzugeben. Die Kontrollflussdaten können Anmerkungen beinhalten, die angeben, ob Optimierungen für die verschiedenen Befehlsblöcke durchgeführt werden können. Die Kontrollflussdaten können auch verwendet werden, um die Ausführung der Befehle zu verfolgen, um zu bestimmen, ob ein Befehl in einem Befehlsblock einem Thread, einem Prozess und/oder einem Ausführungskern eines Prozessors zugewiesen ist, und um zu bestimmen, ob Fehler während der Ausführung der Befehle aufgetreten sind.
Abstract:
In one embodiment of the invention, a processor comprising an upper level cache and at least one processor core. The at least one processor core includes one or more registers and a plurality of instruction processing stages: a decode unit to decode an instruction requiring an input of a plurality of data elements, wherein a size of each of the plurality of data elements is less than a cache line size of the processor; an execution unit to load the plurality of data elements to the one or more registers of the processor, without loading data elements spatially adjacent to the plurality of data elements or the plurality of data elements in an upper level cache.
Abstract:
In one embodiment of the invention, a processor comprising an upper level cache and at least one processor core. The at least one processor core includes one or more registers and a plurality of instruction processing stages: a decode unit to decode an instruction requiring an input of a plurality of data elements, wherein a size of each of the plurality of data elements is less than a cache line size of the processor; an execution unit to load the plurality of data elements to the one or more registers of the processor, without loading data elements spatially adjacent to the plurality of data elements or the plurality of data elements in an upper level cache.
Abstract:
Hier werden Mechanismen zur kontinuierlichen automatischen Abstimmung von Coderegionen für optimale Hardware-Auslegungen für die Coderegionen beschrieben. Ein Mechanismus stimmt automatisch die abstimmbaren Parameter für eine demarkierte Coderegion durch ein Berechnen von Metriken ab, während die Coderegion mit verschiedenen Sätzen abstimmbarer Parameter ausgeführt wird, und durch ein Auswählen eines der verschiedenen Sätze auf der Basis der berechneten Metriken.
Abstract:
Systeme, Vorrichtungen und Verfahren für ein Hardware- und Softwaresystem zum automatischen Zerlegen eines Programms in mehrere parallele Threads werden beschrieben. In einigen Ausführungsformen führen die Systeme und Vorrichtungen ein Verfahren zum Zerlegen eines ursprünglichen Codes und/oder einer generierten Thread-Ausführung aus.