Abstract:
A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.
Abstract:
A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.
Abstract:
Systems and apparatuses are presented relating a programmable processor comprising an execution unit that is operable to decode and execute instructions received from an instruction path and partition data stored in registers in the register file into multiple data elements, the execution unit capable of executing a plurality of different group floating-point and group integer arithmetic operations that each arithmetically operates on multiple data elements stored registers in a register file to produce a catenated result that is returned to a register in the register file, wherein the catenated result comprises a plurality of individual results, wherein the execution unit is capable of executing group data handling operations that re-arrange data elements in different ways in response to data handling instructions.
Abstract:
A system and software for improving the performance of processors by incorporating an execution unit operable to decode and execute single instructions specifying both a mask and a register containing data, the mask comprising fields that each correspond to a field of the data contained in the register, the execution unit is operable to detect some of the fields of the mask as having a predetermined value and identifying corresponding fields of the data contained in the register as write-enabled data fields; and cause the write-enabled data fields to be written to a specified memory location.
Abstract:
A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.
Abstract:
A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.
Abstract:
A multiplier array processing system which improves the utilization of the multiplier and adder array for lower-precision arithmetic is described. New instructions are defined which provide for the deployment of additional multiply and add operations as a result of a single instruction, and for the deployment of greater multiply and add operands as the symbol size is decreased.
Abstract:
A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.
Abstract:
A programmable processor and system for improving the performance of processors by incorporating an execution unit operable to decode and execute single instructions specifying three registers each containing a plurality of data elements, the execution unit operable to multiply the first and second registers and add the third register to produce a catenated result containing a plurality of data elements. Additional instructions provide group floating-point subtract, add, multiply, set less, and set greater equal operations. The set less and set greater equal operations produce alternatively zero or an identity element for each element of a catenated result, the result facilitating alternative selection of individual data elements using bitwise Boolean operations and without requiring conditional branch operations.
Abstract:
Systems and apparatuses are presented relating a programmable processor comprising an execution unit that is operable to decode and execute instructions received from an instruction path and partition data stored in registers in the register file into multiple data elements, the execution unit capable of executing a plurality of different group floating-point and group integer arithmetic operations that each arithmetically operates on multiple data elements stored registers in a register file to produce a catenated result that is returned to a register in the register file, wherein the catenated result comprises a plurality of individual results, wherein the execution unit is capable of executing group data handling operations that re-arrange data elements in different ways in response to data handling instructions.