Apparatus and method for vector multiply and accumulate of packed bytes
Abstract:
An apparatus and method for performing multiply-accumulate operations. For example, one embodiment of a processor comprises: a decoder to decode instructions; a first source register to store a first plurality of packed bytes; a second source register to store a second plurality of packed bytes; a third source register to store a plurality of packed doublewords; execution circuitry to execute a first instruction, the execution circuitry comprising: extension circuitry to sign-extend or zero-extend the first and second plurality of packed bytes to generate a first and second plurality of words corresponding to the first and second plurality of packed bytes; multiplier circuitry to multiply each of the first plurality of words with a corresponding one of the second plurality of words to generate a plurality of temporary products; adder circuitry to add at least a first set of the temporary products to generate a first temporary sum; accumulation circuitry to combine the first temporary sum with a first packed doubleword value from a first doubleword location in the third source register to generate a first accumulated doubleword result; a destination register to store the first accumulated doubleword result in the first doubleword location.
Information query
Patent Agency Ranking
0/0