Abstract:
PROBLEM TO BE SOLVED: To provide systems, methods and apparatuses for execution of an instruction that uses a control vector to zero out bits starting at a specific position in each data element of a source in a SIMD processing system.SOLUTION: The execution of a VPBZHI causes, on a per data element basis of a second source, a zeroing of bits higher (more significant) than a starting point in the data element. The starting point is defined by the contents of a data element in a first source. The resultant data elements are stored in a corresponding data element position of a destination.
Abstract:
PROBLEM TO BE SOLVED: To provide vector compress and rotate functionality capable of increasing performance and instruction throughput, and decreasing power use.SOLUTION: In response to an instruction specifying a vector source operand, a mask register, a vector destination operand and a vector destination offset, a mask is read, and corresponding unmasked vector elements are copied from a vector source to adjacent sequential locations in a vector destination, starting at a vector destination offset location.
Abstract:
PROBLEM TO BE SOLVED: To provide instructions and logic that can fuse OR-test and AND-test functionality on multiple test sources.SOLUTION: A test instruction specifies first, second and third source data operands, and an operation type. Execution units, responsive to the decoded test instruction, perform one logical operation, according to the specified operation type, between data from the first and second source data operands, and perform a second logical operation between data from the third source data operand and the result of the first logical operation, to set a condition flag. Some embodiments generate a fused test instruction by dynamically fusing one logical instruction with a test instruction. Other embodiments generate a test instruction through a just-in-time compiler. Some embodiments further fuse a test instruction with a subsequent conditional branch instruction, and perform a branch according to how the condition flag is set.
Abstract:
PROBLEM TO BE SOLVED: To access memory locations in only a limited range of a memory in response to a vector memory access instruction.SOLUTION: A processor 100 includes a plurality of packed data registers 107, and also includes execution logic 109 coupled with the packed data registers. The execution logic is operable in response to a limited range vector memory access instruction 103, indicating a source packed memory index having a plurality of packed memory indices selected from 8-bit memory indices and 16-bit memory indices. The execution logic is operable to access memory locations in only a limited range of a memory in response to the limited range vector memory access instruction.
Abstract:
PROBLEM TO BE SOLVED: To provide methods and apparatus for fusing instructions to provide OR-test and AND-test functionality on multiple test sources.SOLUTION: A methods for fusing instructions in a processor includes: fetching a plurality of instructions, including a first instruction specifying a first operand destination, a second instruction specifying a second operand source, and a third instruction specifying a branch condition; and fusing a portion of the plurality of instructions into a single micro-operation, the portion including both the first and second instructions if the first operand destination and the second operand source are the same and the branch condition is dependent upon the second instruction.
Abstract:
PROBLEM TO BE SOLVED: To provide common operation means which in general makes it possible to adjust mask bits within writemask registers that correspond to elements in a vector register referred to in a SIMD operation instruction.SOLUTION: The execution of a KZBTZ detects a trailing least significant zero bit position in a first input mask and sets an output mask to have values of the first input mask, but with all bit positions closer to the most significant bit position than the trailing least significant zero bit position in a first input mask set to zero. In some embodiments, a second input mask is used as a writemask such that bit positions of the first input mask are not considered in the trailing least significant zero bit position calculation depending upon a corresponding bit position in the second input mask.
Abstract:
“aparelho e método para reverter e permutar bits em um registro de máscara” trata-se de um aparelho e método para realizar uma reversão de bit e permutação em valores de máscara. por exemplo, um processador é descrito para executar uma instrução a fim de realizar as operações de: ler uma pluralidade de bits de máscara armazenada em um registro de máscara de fonte, em que os bits de máscara são associados aos elementos de dados de vetor de um registro de vetor; e realizar uma operação de reversão de bit para copiar cada bit de máscara de um registro de máscara de fonte para um registro de máscara de destinação, em que a operação de reversão de bit faz com que os bits do registro de máscara de fonte sejam revertidos dentro do registro de máscara de destinação resultando em uma imagem espelhada simétrica da disposição de bit original.
Abstract:
A mask generating instruction is executed by a processor to improve efficiency of vector operations on an array of data elements. The processor includes vector registers, one of which stores data elements of an array. The processor further includes execution circuitry to receive a mask generating instruction that specifies at least a first operand and a second operand. Responsive to the mask generating instruction, the execution circuitry is to shift bits of the first operand to the left by a number of times defined in the second operand, and pull in a bit of one from the right each time a most significant bit of the first operand is shifted out from the left to generate a result. Each bit in the result corresponds to one of the data elements of the array.
Abstract:
?método e aparelho para realizar uma coleta de bit de vetor? trata-se de um aparelho e método para realizar uma coleta de bit de vetor. por exemplo, uma modalidade de um processador compreende: um primeiro registro de vetor para armazenar um ou mais elementos de dados de origem; um segundo registro de vetor para armazenar um ou mais elementos de controle, em que cada um dentre os elementos de controle compreende uma pluralidade de campos de bit, em que cada campo de bit deve ser associado a uma posição de bit correspondente em um registro de vetor de destino e para identificar um bit a partir de um ou mais elementos de dados de origem a serem copiados para cada uma dentre as posições de bit particulares; e lógica de coleta de bit de vetor para ler cada campo de bit do segundo registro de vetor para identificar um bit a partir de um ou mais elementos de dados de origem e para copiar de modo responsivo o bit de cada um dentre os um ou mais elementos de dados de origem para cada uma dentre as posições de bit correspondentes no registro de vetor de destino.
Abstract:
aparelho, método e sistema para executar fusão eficiente de instruções. a presente invenção refere-se a uma técnica para propiciar a fusão de instruções eficiente dentro de um sistema de computação. em uma modalidade, uma lógica de processamento retarda o processamento de uma segunda instrução, por um período de tempo limite, se uma primeira instrução dentro de uma fila de instruções for fusível com a segunda instrução.