Abstract:
Systems and methods for performing non-allocating memory access instructions with physical address. A system includes a processor, one or more levels of caches, a memory, a translation look-aside buffer (TLB), and a memory access instruction specifying a memory access by the processor and an associated physical address. Execution logic is configured to bypass the TLB for the memory access instruction and perform the memory access with the physical address, while avoiding allocation of one or more intermediate levels of caches where a miss may be encountered.
Abstract:
An example method for placing one or more element data values into an output vector includes identifying a vertical permute control vector including a plurality of elements, each element of the plurality of elements including a register address. The method also includes for each element of the plurality of elements, reading a register address from the vertical permute control vector. The method further includes retrieving a plurality of element data values based on the register address. The method also includes identifying a horizontal permute control vector including a set of addresses corresponding to an output vector. The method further includes placing at least some of the retrieved element data values of the plurality of element data values into the output vector based on the set of addresses in the horizontal permute control vector.
Abstract:
An apparatus and method to translate virtual addresses to physical addresses in a base plus offset addressing mode are disclosed. In an embodiment, a method includes performing a first translation lookaside buffer (TLB) lookup based on a base address value to retrieve a speculative physical address. While performing the TLB lookup based on the base address value, the base address value is added to an offset value to generate an effective address value. The method also includes performing a comparison of the base address value and the effective address value based on a variable page size to determine whether the speculative physical address corresponds to the effective address.
Abstract:
A memory management unit (MMU) for servicing transaction requests from one or more processor threads is described. The MMU can include a translation lookaside buffer (TLB). The TLB can include a storage module and a logic circuit. The storage module can store a bit indicating one of a plurality of interfaces. The bit can be associated with a physical address range. The logic circuit can route a physical address within the physical address range to the one of the plurality of interfaces.
Abstract:
Techniques are provided for executing a vector alignment instruction. A scalar register file in a first processor is configured to share one or more register values with a second processor, the one or more register values accessed from the scalar register file according to an Rt address specified in a vector alignment instruction, wherein a start location is determined from one of the shared register values. An alignment circuit in the second processor is configured to align data identified between the start location within a beginning Vu register of a vector register file (VRF) and an end location of a last Vu register of the VRF according to the vector alignment instruction. A store circuit is configured to select the aligned data from the alignment circuit and store the aligned data in the vector register file according to an alignment store address specified by the vector alignment instruction.
Abstract:
A method includes executing an instruction at a processor, where executing the instruction includes comparing a data value of a plurality of data values to a first element stored at a first location of a storage device. When the data value satisfies a condition with respect to the first element, the method includes moving the first element to a second location of the storage device and inserting the data value into the first location of the storage device.
Abstract:
An example method for placing one or more element data values into an output vector includes identifying a vertical permute control vector including a plurality of elements, each element of the plurality of elements including a register address. The method also includes for each element of the plurality of elements, reading a register address from the vertical permute control vector. The method further includes retrieving a plurality of element data values based on the register address. The method also includes identifying a horizontal permute control vector including a set of addresses corresponding to an output vector. The method further includes placing at least some of the retrieved element data values of the plurality of element data values into the output vector based on the set of addresses in the horizontal permute control vector.
Abstract:
Techniques are provided for executing a vector alignment instruction. A scalar register file in a first processor is configured to share one or more register values with a second processor, the one or more register values accessed from the scalar register file according to an Rt address specified in a vector alignment instruction, wherein a start location is determined from one of the shared register values. An alignment circuit in the second processor is configured to align data identified between the start location within a beginning Vu register of a vector register file (VRF) and an end location of a last Vu register of the VRF according to the vector alignment instruction. A store circuit is configured to select the aligned data from the alignment circuit and store the aligned data in the vector register file according to an alignment store address specified by the vector alignment instruction.
Abstract:
Techniques are provided for executing a vector alignment instruction. A scalar register file in a first processor is configured to share one or more register values with a second processor, the one or more register values accessed from the scalar register file according to an Rt address specified in a vector alignment instruction, wherein a start location is determined from one of the shared register values. An alignment circuit in the second processor is configured to align data identified between the start location within a beginning Vu register of a vector register file (VRF) and an end location of a last Vu register of the VRF according to the vector alignment instruction. A store circuit is configured to select the aligned data from the alignment circuit and store the aligned data in the vector register file according to an alignment store address specified by the vector alignment instruction.
Abstract:
Systems and methods for performing non-allocating memory access instructions with physical address. A system includes a processor, one or more levels of caches, a memory, a translation look-aside buffer (TLB), and a memory access instruction specifying a memory access by the processor and an associated physical address. Execution logic is configured to bypass the TLB for the memory access instruction and perform the memory access with the physical address, while avoiding allocation of one or more intermediate levels of caches where a miss may be encountered.