Vector instruction for accumulating and compressing values based on input mask
Abstract:
A processor includes a decode circuit to decode an instruction into a decoded instruction and an execution circuit to execute the decoded instruction to sum one or more values of one or more contiguous elements of an input vector that form a block to produce an accumulated value for the block and store the accumulated value for the block in a destination vector, where an input mask dictates the one or more contiguous elements of the input vector that form the block.
Information query
Patent Agency Ranking
0/0