Abstract:
A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.
Abstract:
A vector-matrix multiplier unit fully utilizes a 128x128b data path for operand sizes from 8 to 128b and operand types including signed, unsigned or complex, and fixed-, floating-point, polynomial, or Galois-field while maintaining full internal precision. The present disclosure provides a system and method for improving the performance of general-purpose processor, by implementing a functional unit that computes the product of a matrix operand with a vector operand, producing a vector result. The functional unit fully utilizes the entire resources of a 128b by 128b multipliers regardless of the operand size, as the number of elements of the matrix and vector operands increase as operand size is reduced. The unit performs both fixed-point and floating-point multiplications and additions with the highest-possible intermediate accuracy with modest resources.
Abstract:
The present invention provides a system and method for improving the performance of general-purpose processors by implementing a functional unit that computes the product of a matrix operand with a vector operand, producing a vector result. The functional unit fully utilizes the entire resources of a 128b by 128b multipliers regardsless of the operand size, as the number of elements of the matrix and vector operands increase as operand size is reduced. The unit performs both fixed-point and floating-point multiplications and additions with the highest-possible intermediate accuracy with modest resources.
Abstract:
Expandably wide operations are disclosed in which operands wider than the data path between a processor and memory are used in executing instructions. The expandably wide operands reduce the influence of the characteristics of the associated processor in the design of functional units performing calculations, including the width of the register file, the processor clock rate, the exception subsystem of the processor, and the sequence of operations in loading and use of the operand in a wide cache memory.
Abstract:
The present invention provides a cross-bar circuit (100) that implements a switch (115) of a broadband processor. The cross-bar circuit (100) includes: a switch circuit (115) which includes 2m.2n:1 multiplexor circuits (202-204) where each of the 2n:1 multiplexor circuits (202-204) has a unique n-bit index input, one disable input, and a 2n-bit wide source input receives an n-bit index at the n-bit index input, a disable bit at the disable input, and the 2n-bit input source word at the 2n-bit wide source input, and decodes the n-bit index either to select and output as an output destination bit one bit from the 2n-bit input source word if the disable bit has a logic low value; a cache memory (110) that has 2m cache datapath inputs; and 2m cache index input; and a control circuit (105) that has a plurality of control inputs receives the partially decoded instruction information on the plurality of control inputs, provides a second set of the n-bit indexes for the switch circuit (115), and provides the disable bits for the switch circuit (115) where the control circuit (105) is logically coupled to the switch circuit (115) and to the cache memory (110).
Abstract:
A general purpose processor with four copies of an access unit, with an access instruction fetch queue A-queue (101-104). Each A-queue (101-104) is coupled to an access register file AR (105-108) which is coupled to two access functional units A (109-116). In a typical embodiment, each thread of the processor may have on the order of sixty-four general purpose registers. The access unit functions independently by four simultaneous threads of execution, and each compute control flow by performing arithmetic and branch instructions and access memory by performing load and store instructions. These access units also provide wide specifiers for wide operand instructions. These eight access functional units A (109-116) produce results for access register files (105-108) and memory addresses to a shared memory system (117-120).
Abstract:
Expandably wide operations are disclosed in which operands wider than the data path between a processor and memory are used in executing instructions. The expandably wide operands reduce the influence of the characteristics of the associated processor in the design of functional units performing calculations, including the width of the register file, the processor clock rate, the exception subsystem of the processor, and the sequence of operations in loading and use of the operand in a wide cache memory.
Abstract:
A general purpose, programmable media processor (12) for processing and transmitting a media data streams. The media processor (12) incorporates an execution unit (100) that maintains substantially peak data through out of media data streams. The execution unit (100) includes a dynamically partionable multi-precision arithmetic unit (102), programmable switch (104) and programmable extended mathematical element (106). A high bandwidth external interface (124) supplies media data streams at substantially peak rates to a general purpose register file (110) and the execution unit. A memory management unit, and instruction and data cache/buffers (118, 120) are provided. The general purpose, programmable media processor (12) is disposed in a network fabric consisting of fiber optic cable, coaxial cable and twisted pair wires to transmit, process and receive single or unified media data streams.
Abstract:
A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width accessible number of general purpose registers.
Abstract:
A processor and method for performing outer product and outer product accumulation operations on vector operands requiring large numbers of multiplies and accumulations are disclosed.