Abstract:
A core includes a memory buffer and executes an instruction within a virtual machine. A processor tracer captures trace data and formats the trace data as trace data packets. An event-based sampler generates field data for a sampling record in response to occurrence of an event of a certain type as a result of execution of the instruction. The processor tracer, upon receipt of the field data: formats the field data into elements of the sampling record as a group of record packets; inserts the group of record packets between the trace data packets as a combined packet stream; and stores the combined packet stream in the memory buffer as a series of output pages. The core, when in guest profiling mode, executes a virtual machine monitor to map output pages of the memory buffer to host physical pages of main memory using multilevel page tables.
Abstract:
Systems and method are disclosed for monitoring processor performance. Embodiments described relate to differentiating function performance by input parameters. In one embodiment, a method includes configuring a counter contained in a processor to count occurrences of an event in the processor and to overflow upon the count of occurrences reaching a specified value, configuring a precise event based sampling (PEBS) handler circuit to generate and store a PEBS record into a PEBS memory buffer after at least one overflow, the PEBS record containing at least one stack entry read from a stack after the at least one overflow, enabling the PEBS handler circuit to generate and store the PEBS record after the at least one overflow, generating and storing the PEBS record into the PEBS memory buffer after the at least one overflow; and storing contents of the PEBS memory buffer to a PEBS trace file in a memory.
Abstract:
A processor includes a front end, an execution unit, a retirement stage, a counter, and a performance monitoring unit. The front end includes logic to receive an event instruction to enable supervision of a front end event that will delay execution of instructions. The execution unit includes logic to set a register with parameters for supervision of the front end event. The front end further includes logic to receive a candidate instruction and match the candidate instruction to the front end event. The counter includes logic to generate the front end event upon retirement of the candidate instruction.
Abstract:
In an embodiment, a processor includes at least one core and energy performance gain (EPG) logic to determine an EPG frequency based on a first value of an EPG. The EPG is based upon energy consumed by the processor and upon performance of the processor. The processor also includes a clock generator to generate a frequency of operation of the at least one core based on the EPG frequency. Other embodiments are described and claimed.
Abstract:
A processor includes a front end, an execution unit, a retirement stage, a counter, and a performance monitoring unit. The front end includes logic to receive an event instruction to enable supervision of a front end event that will delay execution of instructions. The execution unit includes logic to set a register with parameters for supervision of the front end event. The front end further includes logic to receive a candidate instruction and match the candidate instruction to the front end event. The counter includes logic to generate the front end event upon retirement of the candidate instruction.
Abstract:
In an embodiment, a processor includes at least one core and energy performance gain (EPG) logic to determine an EPG frequency based on a first value of an EPG. The EPG is based upon energy consumed by the processor and upon performance of the processor. The processor also includes a clock generator to generate a frequency of operation of the at least one core based on the EPG frequency. Other embodiments are described and claimed.
Abstract:
A processor includes a front end, an execution unit, a retirement stage, a counter, and a performance monitoring unit. The front end includes logic to receive an event instruction to enable supervision of a front end event that will delay execution of instructions. The execution unit includes logic to set a register with parameters for supervision of the front end event. The front end further includes logic to receive a candidate instruction and match the candidate instruction to the front end event. The counter includes logic to generate the front end event upon retirement of the candidate instruction.
Abstract:
A processor includes a memory subunit that includes a status register and an execution engine unit to: randomly select a load operation to monitor; determine a re-order buffer identifier of the load operation; and transmit the re-order buffer identifier to the memory subsystem. Responsive to receipt of the re-order buffer identifier, the first memory subunit is to store a piece of information, related to a status of the load operation, in the status register. The processor also includes logic to, responsive to detection of retirement of the load operation, store memory information in memory-related fields of a record of a memory buffer. The memory information includes auxiliary information (AUX) and access latency information, wherein one of the auxiliary information or the access latency information includes the piece of information, from the status register, stored in a particular field of the memory-related fields.
Abstract:
Detailed herein are examples of hybrid (heterogenous) performance monitoring unit enumeration. In some examples, a processor supports an instruction that enumerates performance monitoring unit enumeration. For example, the processor comprises decoder circuitry to decode an instance of a single instruction, the single instruction to include a field for an opcode; and execution circuitry to execute the decoded instruction according to the opcode to return the processor identification and feature information including an enumeration of heterogenous performance monitoring unit capabilities.
Abstract:
A processor includes a front end including circuitry to decode an instruction from an instruction stream and a core including circuitry to process the instruction. The core includes an execution pipeline, a dynamic core frequency logic unit, and a counter compensation logic unit. The execution pipeline includes circuitry to execute the instruction. The dynamic core frequency logic unit includes circuitry to squash a clock of the core to reduce a core frequency. The clock may not be visible to software. The counter compensation logic unit includes circuitry to adjust a performance counter increment associated with a performance counter based on at least the dynamic core frequency logic unit circuitry to squash a clock of the core to reduce a core frequency.