Abstract:
PROBLEM TO BE SOLVED: To provide a processor which reduces overhead in gathering and scattering multiple data elements.SOLUTION: Efficient data transfer operations can be achieved by: decoding by a processor device 140, 160, a single instruction specifying a transfer operation for a plurality of data elements between a first storage location and a second storage location; issuing the single instruction for execution by an operation execution unit in the processor; detecting occurrence of an exception during execution of the single instruction; and, in response to the exception, delivering pending traps or interrupts to an exception handler prior to delivering the exception.
Abstract:
PROBLEM TO BE SOLVED: To achieve to gather and scatter multiple data elements. SOLUTION: Efficient data transfer processing can be achieved by: a step of decoding by a processor device, a single instruction specifying transfer processing for a plurality of data elements between a first storage area and a second storage area; a step of issuing the single instruction for execution by an execution unit in the processor; a step of detecting the occurrence of an exception during execution of the single instruction; and in response to the exception, a step of delivering pending traps or interrupts to an exception handler before delivering the exception. COPYRIGHT: (C)2011,JPO&INPIT
Abstract:
A processor executes a vector move instruction to move data elements from a second vector register to a first vector register under the control of a first mask register and a second mask register. A register file within the processor includes the first vector register, the second vector register, the first mask register and the second mask register. In response to the vector move instruction, execution circuitry in the processor is to replace a given number of target data elements in the first vector register with the given number of source data elements in the second vector register. Each source data element corresponds to a mask bit in the second mask register having a second bit value, and wherein each target data element corresponds to a mask bit in the first mask register having a first bit value.
Abstract:
In one embodiment, a processor may include a vector unit to perform operations on multiple data elements responsive to a single instruction, and a control unit coupled to the vector unit to provide the data elements to the vector unit, where the control unit is to enable an atomic vector operation to be performed on at least some of the data elements responsive to a first vector instruction to be executed under a first mask and a second vector instruction to be executed under a second mask. Other embodiments are described and claimed.
Abstract:
A system and method for the design and operation of a distributed shared cache in a multi-core processor is disclosed. In one embodiment, the shared cache may be distributed among multiple cache molecules. Each of the cache molecules may be closest, in terms of access latency time, to one of the processor cores. In one embodiment, a cache line brought in from memory may initially be placed into a cache molecule that is not closest to a requesting processor core. When the requesting processor core makes repeated accesses to that cache line, it may be moved either between cache molecules or within a cache molecule. Due to the ability to move the cache lines within the cache, in various embodiments special search methods may be used to locate a particular cache line.
Abstract:
A system and method for the design and operation of a cache system with differing cache location lengths in level one caches is disclosed. In one embodiment, each level one cache may include groups of cache locations of differing length, capable of holding portions of a level two cache line. A state tree may be created from data in a sharing vector. When a request arrives from a level one cache, the level two cache may examine the nodes of the state tree to determine whether the node of the state tree corresponding to the incoming request is already active. The results of this determination may be used to inhibit or permit the concurrent processing of the request.
Abstract:
Ein Prozessor führt einen Maskenaktualisierungsbefehl zum Durchführen von Aktualisierungen an einem ersten Maskenregister und einem zweiten Maskenregister durch. Eine Registerdatei in dem Prozessor umfasst das erste Maskenregister und das zweite Maskenregister. Der Prozessor umfasst eine Ausführungsschaltungsanordnung zum Ausführen des Maskenaktualisierungsbefehls. Als Antwort auf den Maskenaktualisierungsbefehl hat die Ausführungsschaltungsanordnung eine bestimmte Anzahl an Maskenbits im ersten Maskenregister zu invertieren und die bestimmte Anzahl an Maskenbits auch im zweiten Maskenregister zu invertieren.
Abstract:
A processor executes a mask update instruction to perform updates to a first mask register and a second mask register. A register file within the processor includes the first mask register and the second mask register. The processor includes execution circuitry to execute the mask update instruction. In response to the mask update instruction, the execution circuitry is to invert a given number of mask bits in the first mask register, and also to invert the given number of mask bits in the second mask register.
Abstract:
A processor executes a mask update instruction to perform updates to a first mask register K2 and a second mask register K1. In response to the mask update instruction, execution circuitry inverts a given number of mask bits in the first mask register from a first bit value (e.g. 1) indicating valid data to a second bit value (e.g. 0) indicating an available slot, and inverts the given number of mask bits in the second mask register from the second bit value to the first bit value. The execution circuitry further moves (e.g. merge 320) the given number of elements (e.g. 3, shown with value C0) from a first vector register V2 to a second vector register V1 at the same relative positions as the inverted bits in the second mask register K1.
Abstract:
Befehle und eine Logik stellen eine SIMD-Adressenkonflikt-Detektionsfunktionalität bereit. Einige Ausführungsformen umfassen Prozessoren mit einem Register mit einer variablen Anzahl von Datenfeldern, wobei jedes der Datenfelder einen Versatz für ein Datenelement in einem Speicher speichern soll. Ein Zielregister hat entsprechende Datenfelder, wobei jedes dieser Datenfelder eine variable zweite Anzahl von Bits speichern soll, um eine Konfliktmaske mit einem Maskenbit für jeden Versatz zu speichern. Als Antwort auf die Decodierung eines Vektorkonfliktbefehls vergleichen Ausführungseinheiten den Versatz in jedem Datenfeld mit jedem niedrigerwertigen Datenfeld, um zu bestimmen, ob sie einen übereinstimmenden Versatz tragen, und setzen in entsprechenden Konfliktmasken in dem Zielregister jegliche Maskenbits, die einem niedrigerwertigen Datenfeld entsprechen, das einen übereinstimmenden Versatz aufweist. Eine Vektoradressenkonfliktdetektion kann mit Elementen variabler Größe verwendet werden und zum Erzeugen von Konfliktmasken verwendet werden, um Abhängigkeiten in Sammeln-Modifizieren-Verteilen-SIMD-Operationen zu lösen.