Abstract:
In lieu of branch prediction, a merged fetch-branch unit operates in parallel with the decode unit within a processor. Upon detection of a branch instruction within a group of one or more fetched instructions, any instructions preceding the branch are marked- regular instructions, the branch instruction is marked as such, and any instructions following branch are marked sequential instructions. Within two cycles, sequential instructions following the last fetched instruction are retrieved and marked, target instructions beginning at the branch target address are retrieved and marked, and the branch is resolved. Either the sequential or target instructions are then dropped depending on the branch resolution, incurring a fixed, 1 cycle branch penalty.
Abstract:
A hyperprocessor includes a control processor controlling tasks executed by a plurality of processor cores, each of which may include multiple execution units, or special hardware units. The control processor schedules tasks according to control threads for the tasks created during compilation and comprising a hardware context including register files, a program counter and status bits for the respective task. The tasks are dispatched to the processor cores or special hardware units for parallel, sequential, out-of-order or speculative execution. A universal register file contains data to be operated on by the task, and an interconnect couples at least the processor cores or special hardware units to each other and to the universal register file, allowing each node to communicate with any other node.
Abstract:
A circular buffer storing packets for processing by one or more network processors employs an empty buffer address register identifying where a next received packet should be stored, a next packet address register identifying the next packet to be processed, and a packet-processing address register within each network processor identifying the packet being processed by that network processor. The n-bit addresses to the buffer are mapped or masked from/to the m-bit packet-processing address registers by software, allowing the buffer size to be fully scalable. A dedicated packet retrieval instruction supported by the network processor(s) retrieves a new packet for processing using the next packet address register and copies that into the associated packet-processing address register for use in subsequent accesses. Buffer management is thus independent of the network processor architecture.
Abstract:
In lieu of branch prediction, a merged fetch-branch unit operates in parallel with the decode unit within a processor. Upon detection of a branch instruction within a group of one or more fetched instructions, any instructions preceding the branch are marked- regular instructions, the branch instruction is marked as such, and any instructions following branch are marked sequential instructions. Within two cycles, sequential instructions following the last fetched instruction are retrieved and marked, target instructions beginning at the branch target address are retrieved and marked, and the branch is resolved. Either the sequential or target instructions are then dropped depending on the branch resolution, incurring a fixed, 1 cycle branch penalty.