Abstract:
PROBLEM TO BE SOLVED: To access memory locations in only a limited range of a memory in response to a vector memory access instruction.SOLUTION: A processor 100 includes a plurality of packed data registers 107, and also includes execution logic 109 coupled with the packed data registers. The execution logic is operable in response to a limited range vector memory access instruction 103, indicating a source packed memory index having a plurality of packed memory indices selected from 8-bit memory indices and 16-bit memory indices. The execution logic is operable to access memory locations in only a limited range of a memory in response to the limited range vector memory access instruction.
Abstract:
PROBLEM TO BE SOLVED: To provide efficient vector roll operation.SOLUTION: A resultant rolled version of an input vector is created by: forming a first intermediate vector by barrel-rolling elements of the input vector along a first of two lanes defined by an upper half and a lower half of the input vector; forming a second intermediate vector by barrel-rolling elements of the input vector along a second of the two lanes; and forming the resultant rolled version of the input vector by incorporating upper portions of one of the intermediate vector's upper and lower halves as upper portions of the resultant's upper and lower halves and incorporating lower portions of the other intermediate vector's upper and lower halves as lower portions of the resultant's upper and lower halves.
Abstract:
PROBLEM TO BE SOLVED: To provide common operation means which in general makes it possible to adjust mask bits within writemask registers that correspond to elements in a vector register referred to in a SIMD operation instruction.SOLUTION: The execution of a KZBTZ detects a trailing least significant zero bit position in a first input mask and sets an output mask to have values of the first input mask, but with all bit positions closer to the most significant bit position than the trailing least significant zero bit position in a first input mask set to zero. In some embodiments, a second input mask is used as a writemask such that bit positions of the first input mask are not considered in the trailing least significant zero bit position calculation depending upon a corresponding bit position in the second input mask.
Abstract:
PROBLEM TO BE SOLVED: To provide vector compress and rotate functionality capable of increasing performance and instruction throughput, and decreasing power use.SOLUTION: In response to an instruction specifying a vector source operand, a mask register, a vector destination operand and a vector destination offset, a mask is read, and corresponding unmasked vector elements are copied from a vector source to adjacent sequential locations in a vector destination, starting at a vector destination offset location.
Abstract:
PROBLEM TO BE SOLVED: To provide: embodiments of an instruction generically called square-multiply (SQRMUL) instruction; and systems, architectures and instruction formats for use to improve latency.SOLUTION: Two source registers 101 and 103 hold values A and B, respectively. These values are processed by execution logic 107 to produce A, A*B, and B. These results are stored in a destination register 105. This register may be a general-purpose register such as a doubleword sized register, or a packed-data register with data element positions dedicated to storing calculated values.
Abstract:
PROBLEM TO BE SOLVED: To provide systems, methods and apparatuses for execution of an instruction that uses a control vector to zero out bits starting at a specific position in each data element of a source in a SIMD processing system.SOLUTION: The execution of a VPBZHI causes, on a per data element basis of a second source, a zeroing of bits higher (more significant) than a starting point in the data element. The starting point is defined by the contents of a data element in a first source. The resultant data elements are stored in a corresponding data element position of a destination.
Abstract:
PROBLEM TO BE SOLVED: To provide instructions and logic that provide vectorization of conditional loops.SOLUTION: A vector expand instruction has a parameter to specify a source vector, a parameter to specify a conditions mask register, and a destination parameter to specify a destination register to hold n consecutive vector elements. Each of the plurality of n consecutive vector elements has an equal variable partition size of m bytes. In response to the processor instruction, data is copied from consecutive vector elements in the source vector, and copied to unmasked vector elements of the specified destination vector, where n varies according to the processor instruction executed.
Abstract:
Embodiments of systems, apparatuses, and methods for performing in a computer processor a data element shuffle and an operation on the shuffled data elements in response to a single data element shuffle and an operation instruction that includes a destination vector register operand, a first and second source vector register operands, an immediate value, and an opcode are described.