-
11.
公开(公告)号:US09465680B1
公开(公告)日:2016-10-11
申请号:US14721819
申请日:2015-05-26
Applicant: INTEL CORPORATION
Inventor: Michael W. Chynoweth , Jonathan D. Combs , Angela D. Schmid , Kimberly C. Weier , Ahmad Yasin , Jason W. Brandt , Charlie J. Hewett , Seth Abraham , Matthew C. Merten
CPC classification number: G06F9/542 , G06F11/00 , G06F11/3024 , G06F11/3409
Abstract: A processor and method are described for implementing performance monitoring using a fixed function performance counter. For example, one embodiment of an apparatus comprises: a fixed function performance counter to decrement or increment upon occurrence of an event in the processing device; a precise event based sampling (PEBS) enable control communicably coupled to the fixed function performance counter; a PEBS handler to generate and store a PEBS record comprising architectural metadata defining a state of the processing device at a time of generation of the PEBS record; and a non-precise event based sampling (NPEBS) module communicably coupled to the PEBS enable control and the PEBS handler, the NPEBS module to cause the PEBS handler to generate the PEBS record for the event upon the fixed function performance counter reaching a specified value.
Abstract translation: 描述了使用固定功能性能计数器实现性能监视的处理器和方法。 例如,设备的一个实施例包括:固定功能性能计数器,用于在处理设备中发生事件时递减或递增; 精确的基于事件的采样(PEBS)使能控制可通信地耦合到固定功能性能计数器; PEBS处理器,用于生成和存储PEBS记录,其包括在生成PEBS记录时定义处理设备的状态的架构元数据; 以及可通信地耦合到PEBS使能控制和PEBS处理器的非精确事件采样(NPEBS)模块,NPEBS模块使固定功能性能计数器达到指定值时使PEBS处理程序生成事件的PEBS记录 。
-
公开(公告)号:US20160259646A1
公开(公告)日:2016-09-08
申请号:US15155204
申请日:2016-05-16
Applicant: Intel Corporation
Inventor: Ahmad Yasin , Michael W. Chynoweth , Ofer Levy , Jason W. Brandt , Angela Schmid
CPC classification number: G06F9/3806 , G06F9/30058 , G06F9/30098 , G06F11/3419 , G06F11/348 , G06F2201/865 , G06F2201/88
Abstract: A processing device implementing an elapsed cycle timer in last branch records (LBRs) is disclosed. A processing device of the disclosure includes a last branch record (LBR) counter to iterate with each cycle of the processing device. The processing device further includes at least one register communicably coupled to the LBR counter, the at least one register to provide an LBR structure comprising a plurality of LBR entries. An LBR entry of the plurality of LBR entries includes an address instruction pointer (IP) of a branch instruction executed by the processing device, an address IP of a target of the branch instruction, and an elapsed time field that stores a value of the LBR counter in response to creation of the LBR entry.
Abstract translation: 公开了一种在最后的分支记录(LBR)中实现经过周期定时器的处理装置。 本公开的处理装置包括与处理装置的每个周期重复的最后一个分支记录(LBR)计数器。 所述处理设备还包括至少一个可通信地耦合到所述LBR计数器的寄存器,所述至少一个寄存器提供包括多个LBR入口的LBR结构。 多个LBR条目的LBR条目包括由处理装置执行的分支指令的地址指令指针(IP),分支指令的目标的地址IP以及存储LBR的值的经过时间字段 反对创建LBR条目。
-
公开(公告)号:US20240330146A1
公开(公告)日:2024-10-03
申请号:US18194400
申请日:2023-03-31
Applicant: Intel Corporation
Inventor: Moshe Cohen , Ahmad Yasin
IPC: G06F11/34
CPC classification number: G06F11/3495 , G06F11/3409
Abstract: Techniques for snapshotting of performance monitoring are described. In an embodiment, an apparatus includes a plurality of performance monitoring hardware resources, hardware to capture a record of state data related to state of the apparatus in connection with an occurrence of an event, and storage to store a first indicator corresponding to at least a first performance monitoring hardware resource of the plurality of performance monitoring hardware resources and to enable the hardware to include, in the record, performance data from the first performance monitoring hardware resource.
-
公开(公告)号:US12008398B2
公开(公告)日:2024-06-11
申请号:US16729370
申请日:2019-12-28
Applicant: Intel Corporation
Inventor: Ahmad Yasin , Julius Mandelblat , Eliezer Weissmann , Rajshree A. Chabukswar , Michael W. Chynoweth
CPC classification number: G06F9/4881 , G06F9/30101 , G06F9/321 , G06F9/485
Abstract: Embodiments of apparatuses, methods, and systems for performance monitoring in heterogenous systems are described. In an embodiment, an apparatus includes a plurality of performance counters to generate a plurality of unweighted event counts; a weights storage to store a plurality of weight values, each weight value corresponding to an unweighted event count; a plurality of weighting units, each weighting unit to weight a corresponding unweighted event count based on a corresponding weight value to generate one of a plurality of weighted event counts; and a work counter to receive the weighted event counts and generate a measured work amount.
-
公开(公告)号:US11809873B2
公开(公告)日:2023-11-07
申请号:US17033749
申请日:2020-09-26
Applicant: Intel Corporation
Inventor: Jared W. Stark , Ahmad Yasin , Ajay Amarsingh Singh
IPC: G06F9/38 , G06F9/30 , G06F12/0804
CPC classification number: G06F9/3842 , G06F9/30145 , G06F9/3867 , G06F12/0804
Abstract: Embodiments of apparatuses, methods, and systems for selective use of branch prediction hints are described. In an embodiment, an apparatus includes an instruction decoder and a branch predictor. The instruction decoder is to decode a branch instruction having a hint. The branch predictor is to provide a prediction and a hint-override indicator. The hint-override indicator is to indicate whether the prediction is based on stored information about the branch instruction. The prediction is to override the hint if the hint-override indicator indicates that the prediction is based on stored information about the branch instruction.
-
公开(公告)号:US20230325192A1
公开(公告)日:2023-10-12
申请号:US17705946
申请日:2022-03-28
Applicant: Intel Corporation
Inventor: Ahmad Yasin , Nofar Hasson
CPC classification number: G06F9/3844 , G06F11/348
Abstract: An embodiment of an integrated circuit may comprise a branch prediction unit to predict branches for an instruction decoder and circuitry coupled to the branch prediction unit, the circuitry to track a performance metric for an individual branch misprediction. Other embodiments are disclosed and claimed.
-
公开(公告)号:US11693588B2
公开(公告)日:2023-07-04
申请号:US15929272
申请日:2020-04-21
Applicant: Intel Corporation
Inventor: Ahmad Yasin , Michael Chynoweth , Rajshree Chabukswar , Muhammad Taher
CPC classification number: G06F3/0656 , G06F3/0604 , G06F3/0653 , G06F3/0673 , G06F11/3466
Abstract: A processor includes a memory subunit that includes a status register and an execution engine unit to: randomly select a load operation to monitor; determine a re-order buffer identifier of the load operation; and transmit the re-order buffer identifier to the memory subsystem. Responsive to receipt of the re-order buffer identifier, the first memory subunit is to store a piece of information, related to a status of the load operation, in the status register. The processor also includes logic to, responsive to detection of retirement of the load operation, store memory information in memory-related fields of a record of a memory buffer. The memory information includes auxiliary information (AUX) and access latency information, wherein one of the auxiliary information or the access latency information includes the piece of information, from the status register, stored in a particular field of the memory-related fields.
-
18.
公开(公告)号:US11436118B2
公开(公告)日:2022-09-06
申请号:US16728617
申请日:2019-12-27
Applicant: Intel Corporation
Inventor: Eliezer Weissmann , Omer Barak , Rajshree Chabukswar , Russell Fenger , Eugene Gorbatov , Monica Gupta , Julius Mandelblat , Nir Misgav , Efraim Rotem , Ahmad Yasin
Abstract: An apparatus and method for intelligently scheduling threads across a plurality of logical processors. For example, one embodiment of a processor comprises: a plurality of logical processors including comprising one or more of a first logical processor type and a second logical processor type, the first logical processor type associated with a first core type and the second logical processor type associated with a second core type; a scheduler to schedule a plurality of threads for execution on the plurality of logical processors in accordance with performance data associated with the plurality of threads; wherein if the performance data indicates that a new thread should be executed on a logical processor of the first logical processor type, but all logical processors of the first logical processor type are busy, the scheduler to determine whether to migrate a second thread from the logical processors of the first logical processor type to a logical processor of the second logical processor type based on an evaluation of first and second performance values associated with execution of the first thread on the first or second logical processor types, respectively, and further based on an evaluation of third and fourth performance values associated with execution of the second thread on the first or second logical processor types, respectively.
-
公开(公告)号:US11256506B2
公开(公告)日:2022-02-22
申请号:US16831007
申请日:2020-03-26
Applicant: INTEL CORPORATION
Inventor: Ahmad Yasin
Abstract: A processor includes a front end, an execution unit, a retirement stage, a counter, and a performance monitoring unit. The front end includes logic to receive an event instruction to enable supervision of a front end event that will delay execution of instructions. The execution unit includes logic to set a register with parameters for supervision of the front end event. The front end further includes logic to receive a candidate instruction and match the candidate instruction to the front end event. The counter includes logic to generate the front end event upon retirement of the candidate instruction.
-
公开(公告)号:US20210342134A1
公开(公告)日:2021-11-04
申请号:US17033751
申请日:2020-09-26
Applicant: Intel Corporation
Inventor: Ahmad Yasin , Lihu Rappoport , Jared W. Stark , Jeffrey Baxter , Israel Diamand , Pavel Fridman , Ibrahim Hur , Nir Tell
IPC: G06F8/41
Abstract: Embodiments of apparatuses, methods, and systems for code prefetching are described. In an embodiment, an apparatus includes an instruction decoder, load circuitry, and execution circuitry. The instruction decoder is to decode a code prefetch instruction. The code prefetch instruction is to specify a first instruction to be prefetched. The load circuitry to prefetch the first instruction in response to the decoded code prefetch instruction. The execution circuitry is to execute the first instruction at a fetch stage of a pipeline.
-
-
-
-
-
-
-
-
-