Abstract:
A processor including a decode unit to receive a vector indexed load plus arithmetic and/or logical (A/L) operation plus store instruction. The instruction is to indicate a source packed memory indices operand that is to have a plurality of packed memory indices. The instruction is also to indicate a source packed data operand that is to have a plurality of packed data elements. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the instruction, is to load a plurality of data elements from memory locations corresponding to the plurality of packed memory indices, perform A/L operations on the plurality of packed data elements of the source packed data operand and the loaded plurality of data elements, and store a plurality of result data elements in the memory locations corresponding to the plurality of packed memory indices.
Abstract:
An apparatus and method are described for performing conflict detection operations. For example, one embodiment of a processor comprises: a first source vector register to store a first set of data elements; a second source vector register to store a second set of data elements; conflict detection logic to perform a specified comparison operation comparing each of the first set of data elements with specified data elements from the second set and generating a set of comparison results, the comparison operation to be selected from a group consisting of a greater than comparison, a less than comparison, a greater than or equal to comparison, a less than or equal to comparison, and a not equal to comparison.
Abstract:
Embodiments described herein utilize restricted transactional memory (RTM) instructions to implement speculative compile time optimizations that will be automatically rolled back by hardware in the event of a missed speculation. In one embodiment, a lightweight version of RTM for speculative compiler optimization is described to provide lower operational overhead in comparison to conventional RTM implementations used when performing SLE.
Abstract:
An apparatus and method are described for performing a bit reversal and permutation on mask values. For example, a processor is described to execute an instruction to perform the operations of: reading a plurality of mask bits stored in a source mask register, the mask bits associated with vector data elements of a vector register; and performing a bit reversal operation to copy each mask bit from a source mask register to a destination mask register, wherein the bit reversal operation causes bits from the source mask register to be reversed within the destination mask register resulting in a symmetric, mirror image of the original bit arrangement.
Abstract:
In one embodiment of the invention, a processor including a storage location configured to store a set of source packed-data operands, each of the operands having a plurality of packed-data elements that are positive or negative according to an immediate bit value within one of the operands. The processor also including: a decoder to decode an instruction requiring an input of a plurality of source operands, and an execution unit to receive the decoded instructions and to generate a result that is a sum of the source operands. In one embodiment, the result is stored back into one of the source operands or the result is stored into an operand that is independent of the source operands.
Abstract:
Un procesador que comprende: un decodificador para decodificar una instrucción de reorganización de bits de un vector, comprendiendo la instrucción de reorganización de bits de un vector un primer operando de origen, un segundo operando de origen y un operando de destino; un primer registro de vector identificado por el primer operando de origen para almacenar una pluralidad de elementos de datos de origen; un segundo registro de vector identificado por el segundo operando de origen para almacenar una pluralidad de elementos de control, correspondiendo cada uno de los elementos de control a uno diferente de una pluralidad de elementos de datos de origen en el primer registro de vector y comprendiendo una pluralidad de campos de bit, correspondiendo cada uno de los campos de bit a una única posición de bit en un registro de máscara de destino identificado por el operando de destino, y sirviendo además cada uno de los campos de bit para identificar exactamente un bit del elemento de datos de origen correspondiente para copiarse a la posición única de bit correspondiente en el registro de máscara de destino; y una lógica de reordenación de bits de vector para leer los campos de bits del segundo registro de vector y, para cada uno de los campos de bit, identificar exactamente un bit de los elementos de datos de origen y copiar como consecuencia únicamente el bit identificado del elemento de datos de origen a una única posición de bit correspondiente al campo de bit en el registro de máscara de destino.
Abstract:
Hier dargelegte Ausführungsformen betreffen Systeme und Verfahren zum Laden eines Kachelregisterpaars. In einem Beispiel umfasst ein Prozessor Decodierschaltkreise zum Decodieren einer Ladematrixpaaranweisung mit Feldern für einen Opcode und Quellen- und Zielkennungen zum Identifizieren von Quellen- bzw. Zielmatrizen, wobei jede Matrix einen PAIR-Parameter gleich TRUE aufweist; und Ausführungsschaltkreise zum Ausführen der decodierten Ladematrixpaaranweisungen zum Laden jedes Elements linker und rechter Kacheln der identifizierten Zielmatrix aus entsprechenden Elementpositionen von linken bzw. rechten Kacheln der identifizierten Quellenmatrix, wobei das Ausführen beginnend mit der ersten Zeile an einer Zeile der identifizierten Zielmatrix auf einmal operiert.