Method and apparatus for processor performance monitoring
    11.
    发明授权
    Method and apparatus for processor performance monitoring 有权
    用于处理器性能监控的方法和装置

    公开(公告)号:US09465680B1

    公开(公告)日:2016-10-11

    申请号:US14721819

    申请日:2015-05-26

    CPC classification number: G06F9/542 G06F11/00 G06F11/3024 G06F11/3409

    Abstract: A processor and method are described for implementing performance monitoring using a fixed function performance counter. For example, one embodiment of an apparatus comprises: a fixed function performance counter to decrement or increment upon occurrence of an event in the processing device; a precise event based sampling (PEBS) enable control communicably coupled to the fixed function performance counter; a PEBS handler to generate and store a PEBS record comprising architectural metadata defining a state of the processing device at a time of generation of the PEBS record; and a non-precise event based sampling (NPEBS) module communicably coupled to the PEBS enable control and the PEBS handler, the NPEBS module to cause the PEBS handler to generate the PEBS record for the event upon the fixed function performance counter reaching a specified value.

    Abstract translation: 描述了使用固定功能性能计数器实现性能监视的处理器和方法。 例如,设备的一个实施例包括:固定功能性能计数器,用于在处理设备中发生事件时递减或递增; 精确的基于事件的采样(PEBS)使能控制可通信地耦合到固定功能性能计数器; PEBS处理器,用于生成和存储PEBS记录,其包括在生成PEBS记录时定义处理设备的状态的架构元数据; 以及可通信地耦合到PEBS使能控制和PEBS处理器的非精确事件采样(NPEBS)模块,NPEBS模块使固定功能性能计数器达到指定值时使PEBS处理程序生成事件的PEBS记录 。

    Elapsed Cycle Timer in Last Branch Records
    12.
    发明申请
    Elapsed Cycle Timer in Last Branch Records 审中-公开
    最后分行记录中的经过周期计时器

    公开(公告)号:US20160259646A1

    公开(公告)日:2016-09-08

    申请号:US15155204

    申请日:2016-05-16

    Abstract: A processing device implementing an elapsed cycle timer in last branch records (LBRs) is disclosed. A processing device of the disclosure includes a last branch record (LBR) counter to iterate with each cycle of the processing device. The processing device further includes at least one register communicably coupled to the LBR counter, the at least one register to provide an LBR structure comprising a plurality of LBR entries. An LBR entry of the plurality of LBR entries includes an address instruction pointer (IP) of a branch instruction executed by the processing device, an address IP of a target of the branch instruction, and an elapsed time field that stores a value of the LBR counter in response to creation of the LBR entry.

    Abstract translation: 公开了一种在最后的分支记录(LBR)中实现经过周期定时器的处理装置。 本公开的处理装置包括与处理装置的每个周期重复的最后一个分支记录(LBR)计数器。 所述处理设备还包括至少一个可通信地耦合到所述LBR计数器的寄存器,所述至少一个寄存器提供包括多个LBR入口的LBR结构。 多个LBR条目的LBR条目包括由处理装置执行的分支指令的地址指令指针(IP),分支指令的目标的地址IP以及存储LBR的值的经过时间字段 反对创建LBR条目。

    SNAPSHOTTING OF PERFORMANCE MONITORING
    13.
    发明公开

    公开(公告)号:US20240330146A1

    公开(公告)日:2024-10-03

    申请号:US18194400

    申请日:2023-03-31

    CPC classification number: G06F11/3495 G06F11/3409

    Abstract: Techniques for snapshotting of performance monitoring are described. In an embodiment, an apparatus includes a plurality of performance monitoring hardware resources, hardware to capture a record of state data related to state of the apparatus in connection with an occurrence of an event, and storage to store a first indicator corresponding to at least a first performance monitoring hardware resource of the plurality of performance monitoring hardware resources and to enable the hardware to include, in the record, performance data from the first performance monitoring hardware resource.

    Selective use of branch prediction hints

    公开(公告)号:US11809873B2

    公开(公告)日:2023-11-07

    申请号:US17033749

    申请日:2020-09-26

    CPC classification number: G06F9/3842 G06F9/30145 G06F9/3867 G06F12/0804

    Abstract: Embodiments of apparatuses, methods, and systems for selective use of branch prediction hints are described. In an embodiment, an apparatus includes an instruction decoder and a branch predictor. The instruction decoder is to decode a branch instruction having a hint. The branch predictor is to provide a prediction and a hint-override indicator. The hint-override indicator is to indicate whether the prediction is based on stored information about the branch instruction. The prediction is to override the hint if the hint-override indicator indicates that the prediction is based on stored information about the branch instruction.

    Precise longitudinal monitoring of memory operations

    公开(公告)号:US11693588B2

    公开(公告)日:2023-07-04

    申请号:US15929272

    申请日:2020-04-21

    Abstract: A processor includes a memory subunit that includes a status register and an execution engine unit to: randomly select a load operation to monitor; determine a re-order buffer identifier of the load operation; and transmit the re-order buffer identifier to the memory subsystem. Responsive to receipt of the re-order buffer identifier, the first memory subunit is to store a piece of information, related to a status of the load operation, in the status register. The processor also includes logic to, responsive to detection of retirement of the load operation, store memory information in memory-related fields of a record of a memory buffer. The memory information includes auxiliary information (AUX) and access latency information, wherein one of the auxiliary information or the access latency information includes the piece of information, from the status register, stored in a particular field of the memory-related fields.

    Apparatus and method for adaptively scheduling work on heterogeneous processing resources

    公开(公告)号:US11436118B2

    公开(公告)日:2022-09-06

    申请号:US16728617

    申请日:2019-12-27

    Abstract: An apparatus and method for intelligently scheduling threads across a plurality of logical processors. For example, one embodiment of a processor comprises: a plurality of logical processors including comprising one or more of a first logical processor type and a second logical processor type, the first logical processor type associated with a first core type and the second logical processor type associated with a second core type; a scheduler to schedule a plurality of threads for execution on the plurality of logical processors in accordance with performance data associated with the plurality of threads; wherein if the performance data indicates that a new thread should be executed on a logical processor of the first logical processor type, but all logical processors of the first logical processor type are busy, the scheduler to determine whether to migrate a second thread from the logical processors of the first logical processor type to a logical processor of the second logical processor type based on an evaluation of first and second performance values associated with execution of the first thread on the first or second logical processor types, respectively, and further based on an evaluation of third and fourth performance values associated with execution of the second thread on the first or second logical processor types, respectively.

    Instruction and logic for tracking fetch performance bottlenecks

    公开(公告)号:US11256506B2

    公开(公告)日:2022-02-22

    申请号:US16831007

    申请日:2020-03-26

    Inventor: Ahmad Yasin

    Abstract: A processor includes a front end, an execution unit, a retirement stage, a counter, and a performance monitoring unit. The front end includes logic to receive an event instruction to enable supervision of a front end event that will delay execution of instructions. The execution unit includes logic to set a register with parameters for supervision of the front end event. The front end further includes logic to receive a candidate instruction and match the candidate instruction to the front end event. The counter includes logic to generate the front end event upon retirement of the candidate instruction.

    CODE PREFETCH INSTRUCTION
    20.
    发明申请

    公开(公告)号:US20210342134A1

    公开(公告)日:2021-11-04

    申请号:US17033751

    申请日:2020-09-26

    Abstract: Embodiments of apparatuses, methods, and systems for code prefetching are described. In an embodiment, an apparatus includes an instruction decoder, load circuitry, and execution circuitry. The instruction decoder is to decode a code prefetch instruction. The code prefetch instruction is to specify a first instruction to be prefetched. The load circuitry to prefetch the first instruction in response to the decoded code prefetch instruction. The execution circuitry is to execute the first instruction at a fetch stage of a pipeline.

Patent Agency Ranking