Abstract:
A processor and method for performing outer product and outer product accumulation operations on vector operands requiring large numbers of multiplies and accumulations are disclosed.
Abstract:
A general purpose processor with four copies of an access unit, with an access instruction fetch queue A-queue (101-104). Each A-queue (101-104) is coupled to an access register file AR (105-108) which is coupled to two access functional units A (109-116). In a typical embodiment, each thread of the processor may have on the order of sixty-four general purpose registers. The access unit functions independently by four simultaneous threads of execution, and each compute control flow by performing arithmetic and branch instructions and access memory by performing load and store instructions. These access units also provide wide specifiers for wide operand instructions. These eight access functional units A (109-116) produce results for access register files (105-108) and memory addresses to a shared memory system (117-120).
Abstract:
A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width accessible number of general purpose registers.
Abstract:
A general purpose processor with four copies of an access unit, with an access instruction fetch queue A-queue (101-104). Each A-queue (101-104) is coupled to an access register file AR (105-108) which is coupled to two access functional units A (109-116). In a typical embodiment, each thread of the processor may have on the order of sixty-four general purpose registers. The access unit functions independently by four simultaneous threads of execution, and each compute control flow by performing arithmetic and branch instructions and access memory by performing load and store instructions. These access units also provide wide specifiers for wide operand instructions. These eight access functional units A (109-116) produce results for access register files (105-108) and memory addresses to a shared memory system (117-120).
Abstract:
A general purpose, programmable media processor (12) for processing and transmitting a media data streams. The media processor (12) incorporates an execution unit (100) that maintains substantially peak data through out of media data streams. The execution unit (100) includes a dynamically partionable multi-precision arithmetic unit (102), programmable switch (104) and programmable extended mathematical element (106). A high bandwidth external interface (124) supplies media data streams at substantially peak rates to a general purpose register file (110) and the execution unit. A memory management unit, and instruction and data cache/buffers (118, 120) are provided. The general purpose, programmable media processor (12) is disposed in a network fabric consisting of fiber optic cable, coaxial cable and twisted pair wires to transmit, process and receive single or unified media data streams.
Abstract:
A general purpose processor with four copies of an access unit, with an access instruction fetch queue A-queue (101-104). Each A-queue (101-104) is coupled to an access register file AR (105-108) which is coupled to two access functional units A (109-116). In a typical embodiment, each thread of the processor may have on the order of sixty-four general purpose registers. The access unit functions independently by four simultaneous threads of execution, and each compute control flow by performing arithmetic and branch instructions and access memory by performing load and store instructions. These access units also provide wide specifiers for wide operand instructions. These eight access functional units A (109-116) produce results for access register files (105-108) and memory addresses to a shared memory system (117-120).
Abstract:
A general purpose processor with four copies of an access unit, with an access instruction fetch queue A-queue (101-104). Each A-queue (101-104) is coupled to an access register file AR (105-108) which is coupled to two access functional units A (109-116). In a typical embodiment, each thread of the processor may have on the order of sixty-four general purpose registers. The access unit functions independently by four simultaneous threads of execution, and each compute control flow by performing arithmetic and branch instructions and access memory by performing load and store instructions. These access units also provide wide specifiers for wide operand instructions. These eight access functional units A (109-116) produce results for access register files (105-108) and memory addresses to a shared memory system (117-120).
Abstract:
A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width accessible number of general purpose registers.
Abstract:
A general purpose processor with four copies of an access unit, with an access instruction fetch queue A-queue (101-104). Each A-queue (101-104) is coupled to an access register file AR (105-108) which is coupled to two access functional units A (109-116). In a typical embodiment, each thread of the processor may have on the order of sixty-four general purpose registers. The access unit functions independently by four simultaneous threads of execution, and each compute control flow by performing arithmetic and branch instructions and access memory by performing load and store instructions. These access units also provide wide specifiers for wide operand instructions. These eight access functional units A (109-116) produce results for access register files (105-108) and memory addresses to a shared memory system (117-120).