Abstract:
A hardware profiling mechanism implemented by performance monitoring hardware enables page level automatic binary translation. The hardware during runtime identifies a code page in memory containing potentially optimizable instructions. The hardware requests allocation of a new page in memory associated with the code page, where the new page contains a collection of counters and each of the counters corresponds to one of the instructions in the code page. When the hardware detects a branch instruction having a branch target within the code page, it increments one of the counters that has the same position in the new page as the branch target in the code page. The execution of the code page is repeated and the counters are incremented when branch targets fall within the code page. The hardware then provides the counter values in the new page to a binary translator for binary translation.
Abstract:
A method is described. The method includes receiving an instruction, accessing a return cache to load a predicted return target address upon determining that the instruction is a return instruction, searching a lookup table for executable binary code upon determining that the predicted translated return target address is incorrect and executing the executable binary code to perform a binary translation.
Abstract:
Technologies for partial binary translation on multi-core platforms include a shared translation cache, a binary translation thread scheduler, a global installation thread, and a local translation thread and analysis thread for each processor core. On detection of a hotspot, the thread scheduler first resumes the global thread if suspended, next activates the global thread if a translation cache operation is pending, and last schedules local translation or analysis threads for execution. Translation cache operations are centralized in the global thread and decoupled from analysis and translation. The thread scheduler may execute in a non-preemptive nucleus, and the translation and analysis threads may execute in a preemptive runtime. The global thread may be primarily preemptive with a small non-preemptive nucleus to commit updates to the shared translation cache. The global thread may migrate to any of the processor cores. Forward progress is guaranteed. Other embodiments are described and claimed.
Abstract:
A method and apparatus for implementing a page table walker that uses a sliding field in the virtual addresses to identify entries in a page table. According to one aspect of the invention, an apparatus for use in a computer system is provided that includes a page size storage area (315) and a page table walker. The page size storage area is used to store a number of page sizes selected for translating (305) a number of virtual addresses. The page table walker includes a selection unit (320) coupled to the page size storage area, as well as a page entry address generator (325) coupled to the selection unit. For each virtual address received by the selection unit, the selection unit positions a field in that virtual address based on the page size selected for translating that virtual address. In response to receiving the bits in the field identified for each of the virtual addresses, the page entry address generator (325) identifies an entry in a page table based on those bits.
Abstract:
In a method for switching to a spare processor during runtime, a processing system determines that execution should be migrated off of an active processor. An operating system (OS) scheduler and at least one device are then paused, and the active processor is put into an idle state. State data from writable and substantially non-writable stores in the active processor is loaded into the spare processor. Interrupt routing table logic for the processing system is dynamically reprogrammed to direct external interrupts to the spare processor. The active processor may then be off-lined, and the device and OS scheduler may be unpaused or resumed. Threads may then be dispatched to the spare processor for execution. Other embodiments are described and claimed.
Abstract:
A method and apparatus for implementing a page table walker that uses a sliding field in the virtual addresses to identify entries in a page table. According to one aspect of the invention, an apparatus for use in a computer system is provided that includes a page size storage area and a page table walker. The page size storage area is used to store a number of page sizes selected for translating a number of virtual addresses. The page table walker includes a selection unit coupled to the page size storage area, as well as a page entry address generator coupled to the selection unit. For each virtual address received by the selection unit, the selection unit positions a field in that virtual address based on the page size selected for translating that virtual address. In response to receiving the bits in the field identified for each of the virtual addresses, the page entry address generator identifies an entry in a page table based on those bits.
Abstract:
In one embodiment, the present invention includes a method for receiving control in a kernel mode via a ring transition from a user thread during execution of an unbounded transactional memory (UTM) transaction, updating a state of a transaction status register (TSR) associated with the user thread and storing the TSR with a context of the user thread, and later restoring the context during a transition from the kernel mode to the user thread. In this way, the UTM transaction may continue on resumption of the user thread. Other embodiments are described and claimed.
Abstract:
In a method for switching to a spare processor during runtime, a processing system determines that execution should be migrated off of an active processor. An operating system (OS) scheduler and at least one device are then paused, and the active processor is put into an idle state. State data from writable and substantially non-writable stores in the active processor is loaded into the spare processor. Interrupt routing table logic for the processing system is dynamically reprogrammed to direct external interrupts to the spare processor. The active processor may then be off-lined, and the device and OS scheduler may be unpaused or resumed. Threads may then be dispatched to the spare processor for execution. Other embodiments are described and claimed.