Abstract:
Systems and methods for prefetching cache lines into a cache coupled to a processor. A hardware prefetcher is configured to recognize a memory access instruction as an autoincrement-address (AIA) memory access instruction, infer a stride value from an increment field of the AIA instruction, and prefetch lines into the cache based on the stride value. Additionally or alternatively, the hardware prefetcher is configured to recognize that prefetched cache lines are part of a hardware loop, determine a maximum loop count of the hardware loop, and a remaining loop count as a difference between the maximum loop count and a number of loop iterations that have been completed, select a number of cache lines to prefetch, and truncate an actual number of cache lines to prefetch to be less than or equal to the remaining loop count, when the remaining loop count is less than the selected number of cache lines.
Abstract:
A custom entropy bounded encoding in an X-index and Y-index format is generated for a segment of program code, along with a custom decoding dictionary made up of an X pattern memory and a Y pattern memory. In run time decoding, a mix mask is used with an X pattern selected from the X pattern memory according to the X-index and with a Y pattern selected from the Y pattern memory according to the Y-index to determine an executable instruction. The mix mask identifying the order of bits to combine from the X pattern and the Y pattern. Appropriate hardware implementation and placement of the decoding mechanism and address translation is described during execution of an encoded code segment. Methods, including a genetic process, are also described to determine the X-index, the Y-index, the X patterns, the Y patterns, and one or more mix masks.
Abstract:
A system and method to evaluate a data value as an instruction is disclosed. For example, an apparatus configured to execute program code includes an execute unit configured to execute a first instruction associated with a location of a second instruction. The first instruction is identified by a program counter. The apparatus also includes a decode unit configured to receive the second instruction from the location and to decode the second instruction to generate a decoded second instruction without changing the program counter to point to the second instruction. The first and second instruction are virtual machine instructions and the execute unit is adapted to interprete these virtual machine instructions.
Abstract:
A system and method of processing a hierarchical very long instruction word (VLIW) packet is disclosed. In a particular embodiment, a method of processing instructions is disclosed. The method includes receiving a hierarchical VLIW packet of instructions and decoding an instruction from the packet to determine whether the instruction is a single instruction or whether the instruction includes a subpacket that includes a plurality of sub-instructions. The method also includes, in response to determining that the instruction includes the subpacket, executing each of the sub-instructions.
Abstract:
Loop control systems and methods are disclosed. In a particular embodiment, a hardware loop control logic circuit includes a detection unit to detect an end of loop indicator of a program loop. The hardware loop control logic circuit also includes a decrement unit to decrement a loop count and to decrement a predicate trigger counter. The hardware loop control logic circuit further includes a comparison unit to compare the predicate trigger counter to a reference to determine when to set a predicate value.
Abstract:
A multi-mode register file for each thread of a multi-thread system is described. In one embodiment, the multi-mode register file includes an operand for the thread in a first mode. The multi-mode register file further includes branch prediction information which replaces the operand in a second mode.
Abstract:
A system for determining a cache line to replace is described. In one embodiment, the system includes a cache comprising a plurality of cache lines. The system further includes an identifier configured to identify a cache line for replacement. The system also includes a control logic configured to determine a value of the identifier selected from an incrementer, a cache maintenance instruction, or remains the same.
Abstract:
Techniques for the design and use of a digital signal processor, including (but not limited to) for processing transmissions in a communications (e.g., CDMA) system. A method and system control transferring data between debugging registers and digital signal processor processes in association with a power transition sequence of the digital signal processor. In a digital signal processor, debugging registers associate with the core processor process and the debugging process. Control bits control transferring data among the debugging registers, the core processor process and the debugging process. The control bit prevents transferring data among the debugging registers, the core processor process and the debugging process in the event of a power transition sequence. Control bits also prevent a power transition sequence of the digital signal processor in the event of transferring data among the debugging registers and the core processor process or the debugging process.
Abstract:
A method and system to perform shifting and rounding operations within a microprocessor, such as, for example, a digital signal processor, during execution of a single instruction are described. An instruction to shift and round data within a source register unit of a register file structure is received within a processing unit. The instruction includes a shifting bit value indicating the bit amount for a right shift operation and is subsequently executed to shift data within the source register unit to the right by an encoded bit value, calculated by subtracting a single bit from the shifting bit value contained within the instruction. A predetermined bit extension is further inserted within the vacated bit positions adjacent to the shifted data. Subsequently, an addition operation is performed on the shifted data and a unitary integer value is added to the shifted data to obtain resulting data. Finally, the resulting data is further shifted to the right by a single bit value and a predetermined bit extension is inserted within the vacated bit position to obtain the final rounded data results to be stored within a destination register unit.
Abstract:
A method and system to indicate which page within a software-managed page table triggers an exception within a microprocessor, such as, for example, a digital signal processor, wherein a software-managed translation lookaside buffer (TLB) module receives a virtual address produced by an instruction within a Very Long Instruction Word (VLIW) packet, such as, for example, a fetch instruction, and further compares the virtual address to each stored TLB entry. If a match exists, then the TLB module outputs a corresponding mapped physical address for the instruction. Otherwise, if the VLIW packet spans two pages, where a first page is present as a TLB entry within the TLB module and the second page is missing from the stored TLB entries, an indication bit within a data field of a control register is set to identify the TLB miss exception to a software management unit. The software management unit retrieves the indication bit information from the register and further performs a page table look-up within the software-managed page table using the indication bit information in order to retrieve the missing page information. Subsequently, the missing page information is written into a new TLB entry within the TLB module for subsequent virtual address translation and execution of the packet of instructions.