Abstract:
Parallelization of scalar operations by vector processors using data-indexed accumulators in vector register files, related circuits, methods, and computer-readable media are disclosed. In one aspect, a vector processor comprises a vector register file providing a plurality of write ports and a plurality of vector registers each providing a plurality of accumulators. The vector processor receives an input data vector. For each of the plurality of write ports, the vector processor executes vector operation(s) for accessing an input data value of the input data vector, and determining, based on the input data value, a register index for a vector register among the plurality of vector registers, and an accumulator index for an accumulator among the plurality of accumulators of the vector register. Based on the register index, a register value is retrieved from the register index, and a scalar operation is performed based on the register value and the accumulator index.
Abstract:
Embodiments disclosed in the detailed description include hybrid writethrough/ write-back cache policy managers, and related systems and methods. A cache write policy manager is configured to determine whether at least two caches among a plurality of parallel caches are active. If all of one or more other caches are not active, the cache write policy manager is configured to instruct an active cache among the parallel caches to apply a write-back cache policy. In this manner, the cache write policy manager may conserve power and/or increase performance of a singly active processor core. If any of the one or more other caches are active, the cache write policy manager is configured to instruct an active cache among the parallel caches to apply a write-through cache policy. In this manner, the cache write policy manager facilitates data coherency among the parallel caches when multiple processor cores are active.
Abstract:
Systems and method for reducing interrupt latency time in a multi-threaded processor. A first interrupt controller is coupled to the multi-threaded processor. A second interrupt controller is configured to communicate a first interrupt and a first vector identifier to the first interrupt controller, wherein the first interrupt controller is configured to process the first interrupt and the first vector identifier and send the processed interrupt to a thread in the multi-threaded processor. Logic is configured to determine when the multi-threaded processor is ready to receive a second interrupt. A dedicated line is used to communicate an indication to the second interrupt controller that the multi-threaded processor is ready to receive the second interrupt.
Abstract:
An instruction specifies a source value and an offset value. Upon execution of the instruction, a first result of the instruction and a second result of the instruction are generated. The first result is a first portion of the source value and the second result is a second portion of the source value.
Abstract:
An apparatus includes a memory that stores an instruction including an opcode and an operand. The operand specifies an immediate value or a register indicator of a register storing the immediate value. The immediate value is usable to identify a function call address. The function call address is selectable from a plurality of function call addresses.
Abstract:
A dedicated arithmetic decoding instruction is disclosed. In a particular embodiment, an apparatus includes a memory and a processor coupled to the memory. The processor is configured to execute general purpose instructions and to execute a dedicated arithmetic decoding instruction retrieved from the memory.
Abstract:
In a particular embodiment, a cache is disclosed that includes a tag state array that includes a tag area addressable by a set index. The tag state array also includes a state area addressable by a state address, where the set index and the state address include at least one common bit.
Abstract:
A system and method to execute a linear feedback-shift instruction is disclosed. In a particular embodiment the method includes executing an instruction at a processor by receiving source data and executing a bitwise logical operation on the source data and on reference data to generate intermediate data. The method further includes determining a parity value of the intermediate data, shifting the source data, and entering the parity value of the intermediate data into a data field of the shifted source data to produce resultant data.
Abstract:
A multithreaded processor capable of allocating interrupts is described. In one embodiment, the multithreaded processor includes an interrupt module and threads for executing tasks. The interrupt module can identify a priority for each thread based on a task priority for tasks being executed by the threads and assign an interrupt to a thread based at least on its priority.
Abstract:
A system including a dual function adder is described. In one embodiment, the system includes an adder. The adder is configured for a first instruction to determine an address for a hardware prefetch if the first instruction is a hardware prefetch instruction. The adder is further configures for the first instruction to determine a value from an arithmetic operation if the first instruction is an arithmetic operation instruction.