Abstract:
Embodiments of systems, apparatuses, and methods for performing a blend instruction in a computer processor are described. In some embodiments, the execution of a blend instruction causes a data element-by-element selection of data elements of first and second source operands using the corresponding bit positions of a writemask as a selector between the first and second operands and storage of the selected data elements into the destination at the corresponding position in the destination.
Abstract:
Das Empfangen eines Befehls, der einen Quelloperanden und einen Zieloperanden anzeigt. Das Speichern eines Ergebnisses im Zieloperanden als Antwort auf den Befehl. Der Ergebnisoperand kann aufweisen: (1) einen ersten Bereich von Bits mit einem durch den Befehl explizit angegebenen ersten Ende, wobei jedes Bit im Wert mit einem Bit des Quelloperanden in einer entsprechenden Position identisch ist, und (2) einen zweiten Bereich von Bits, die alle einen gleichen Wert haben, unabhängig von den Werten der Bits des Quelloperanden in entsprechenden Positionen. Die Ausführung des Befehls kann abschließen, ohne den ersten Bereich des Ergebnisses relativ zu den Bits mit identischem Wert in den entsprechenden Positionen des Quelloperanden zu bewegen, unabhängig von der Position des ersten Bereichs von Bits im Ergebnis. Ausführungseinheiten, um solche Befehle auszuführen, Computersysteme, die Prozessoren aufweisen, um solche Befehle auszuführen, und ein maschinenlesbares Medium, das solch einen Befehl speichert, werden ebenfalls offenbart.
Abstract:
Ein Verfahren nach einem Aspekt der Erfindung kann den Empfang eines Drehbefehls umfassen. Der Drehbefehl kann einen Quellenoperanden und einen Drehungsbetrag angeben. Ein Ergebnis kann in einem durch den Drehbefehl angegebenen Zieloperanden gespeichert werden. Im Ergebnis kann der Quellenoperand um den Drehungsbetrag gedreht worden sein. Die Ausführung des Drehbefehls kann ohne Lesen eines Übertrags-F1ags abgeschlossen werden.
Abstract:
In several embodiments, vector extensions to an instruction set architecture include instructions to perform saturated signed and unsigned integer additions. In one embodiment, a vector signed integer add with signed saturation is provided. In one embodiment, a vector unsigned integer add with unsigned saturation is provided. In one embodiment, packed doubleword and quadword integers are supported for both signed and unsigned instructions.
Abstract:
Receive packed data operation mask comparison instruction indicating first packed data operation mask having first packed data operation mask bits and second packed data operation mask having second packed data operation mask bits. Each packed data operation mask bit of first mask corresponds to a packed data operation mask bit of second mask in corresponding position. Modify first flag to first value if bitwise AND of each packed data operation mask bit of first mask with each corresponding packed data operation mask bit of second mask is zero. Otherwise modify first flag to second value. Modify second flag to third value if bitwise AND of each packed data operation mask bit of first mask with bitwise NOT of each corresponding packed data operation mask bit of second mask is zero. Otherwise modify second flag to fourth value.
Abstract:
A method of an aspect includes receiving a masked packed rotate instruction. The instruction indicates a first source packed data including a plurality of packed data elements, a packed data operation mask having a plurality of mask elements, at least one rotation amount, and a destination storage location. A result packed data is stored in the destination storage location in response to the instruction. The result packed data includes result data elements that each correspond to a different one of the mask elements in a corresponding relative position. Result data elements that are not masked out by the corresponding mask element include one of the data elements of the first source packed data in a corresponding position that has been rotated. Result data elements that are masked out by the corresponding mask element include a masked out value. Other methods, apparatus, systems, and instructions are disclosed.
Abstract:
A method of an aspect includes receiving a floating point round-off amount determination instruction. The instruction indicates a source of one or more floating point data elements, indicates a number of fraction bits after a radix point, and indicates a destination storage location. A result including one or more result floating point data elements is stored in the destination storage location in response to the floating point round-off amount determination instruction. Each of the one or more result floating point data elements includes a difference between a corresponding floating point data element of the source in a corresponding position, and a rounded version of the corresponding floating point data element of the source that has been rounded to the indicated number of the fraction bits. Other methods, apparatus, systems, and instructions are disclosed.
Abstract:
A processor core that includes a hardware decode unit to decode a vector frequency compress instruction that includes a source operand and a destination operand. The source operand specifying a source vector register that includes a plurality of source data elements including one or more runs of identical data elements that are each to be compressed in a destination vector register as a value and run length pair. The destination operand identifies the destination vector register. The processor core also includes an execution engine unit to execute the decoded vector frequency compress instruction which causes, for each source data element, a value to be copied into the destination vector register to indicate that source data element's value. One or more runs of the source data elements equal are encoded in the destination vector register as the predetermined compression value followed by a run length for that run.