-
公开(公告)号:US11687391B2
公开(公告)日:2023-06-27
申请号:US17516584
申请日:2021-11-01
Applicant: Intel Corporation
Inventor: Gaurav Porwal , Subhankar Panda , John G. Holm
IPC: G06F11/07
CPC classification number: G06F11/0724 , G06F11/079 , G06F11/0736 , G06F11/0751 , G06F11/0787 , G06F11/0793
Abstract: Upon occurrence of multiple errors in a central processing unit (CPU) package, data indicating the errors is stored in machine check (MC) banks. A timestamp corresponding to each error is stored, the timestamp indicating a time of occurrence for each error. A machine check exception (MCE) handler is generated to address the errors based on the timestamps. The timestamps can be stored in the MC banks or in a utility box (U-box). The MCE handler can then address the errors based on order of occurrence, for example by determining that the first error in time causes the remaining error. The MCE can isolate hardware/software associated with the first error to recover from a failure. The MCE can report only the first error to the operating system (OS) or other error management software/hardware. The U-Box may also convert the timestamps into real time to support user debugging.
-
公开(公告)号:US20230091969A1
公开(公告)日:2023-03-23
申请号:US17483123
申请日:2021-09-23
Applicant: Intel Corporation
Inventor: Gaurav Porwal , Theodros Yigzaw , Subhankar Panda , John Holm
Abstract: Methods and apparatus relating to lane based normalized historical error counter view for faulty lane isolation and disambiguation of transient versus persistent errors are described. In an embodiment, a plurality of storage entries store error information to be detected at one or more physical lanes of an interface. Faulty lane detection logic circuitry determines which of the one or more physical lanes is faulty or more likely to be faulty based at least in part on the stored error information for the one or more physical lanes of the interface. The stored error information comprises historical error details for the one or more physical lanes of the interface. Other embodiments are also disclosed and claimed.
-
3.
公开(公告)号:US12044730B2
公开(公告)日:2024-07-23
申请号:US17131477
申请日:2020-12-22
Applicant: Intel Corporation
Inventor: Gaurav Porwal , Subhankar Panda , Theodros Yigzaw , John Holm
IPC: G01R31/28 , G01R31/317
CPC classification number: G01R31/317
Abstract: Techniques and mechanisms for providing performance monitoring information. In an embodiment, a performance monitor circuit receives a communication which indicates a format comprising multiple fields which are each to store a respective count of monitored events. A programming of the performance monitor circuit, based on the communication, designates first bits and second bits of the register to provide, respectively, a first first field and a second field according to the format. Performance monitoring subsequent to the programming successively tallies a first count of first events which occur during a first period of time, and a second count of second events which occur during a second period of time. In another embodiment, performance monitoring results in the register concurrently storing both the first count and the second count.
-
公开(公告)号:US20180150345A1
公开(公告)日:2018-05-31
申请号:US15362522
申请日:2016-11-28
Applicant: Intel Corporation
Inventor: Gaurav Porwal , Subhankar Panda , John G. Holm
IPC: G06F11/07
Abstract: Upon occurrence of multiple errors in a central processing unit (CPU) package, data indicating the errors is stored in machine check (MC) banks. A timestamp corresponding to each error is stored, the timestamp indicating a time of occurrence for each error. A machine check exception (MCE) handler is generated to address the errors based on the timestamps. The timestamps can be stored in the MC banks or in a utility box (U-box). The MCE handler can then address the errors based on order of occurrence, for example by determining that the first error in time causes the remaining error. The MCE can isolate hardware/software associated with the first error to recover from a failure. The MCE can report only the first error to the operating system (OS) or other error management software/hardware. The U-Box may also convert the timestamps into real time to support user debugging.
-
5.
公开(公告)号:US20220196733A1
公开(公告)日:2022-06-23
申请号:US17131477
申请日:2020-12-22
Applicant: Intel Corporation
Inventor: Gaurav Porwal , Subhankar Panda , Theodros Yigzaw , John Holm
IPC: G01R31/317
Abstract: Techniques and mechanisms for providing performance monitoring information. In an embodiment, a performance monitor circuit receives a communication which indicates a format comprising multiple fields which are each to store a respective count of monitored events. A programming of the performance monitor circuit, based on the communication, designates first bits and second bits of the register to provide, respectively, a first first field and a second field according to the format. Performance monitoring subsequent to the programming successively tallies a first count of first events which occur during a first period of time, and a second count of second events which occur during a second period of time. In another embodiment, performance monitoring results in the register concurrently storing both the first count and the second count.
-
公开(公告)号:US10824493B2
公开(公告)日:2020-11-03
申请号:US15606799
申请日:2017-05-26
Applicant: Intel Corporation
Inventor: Subhankar Panda , Gaurav Porwal
Abstract: A mechanism for disambiguation of error logging during a warm reset is disclosed. A system agent detects an error occurring during bootstrapping of a processor package. The error occurs prior to initiation of a machine check system. A wide pulse event is initiated to signal a wide pulse register to store a wide pulse time stamp counter value. The wide pulse event also signals a lap register to store a lap time stamp counter value. The wide pulse register maintains the wide pulse time stamp counter value during a warm reset, and the lap register clears the lap time stamp counter value during the warm reset. The system agent obtains the wide pulse time stamp counter value and the lap time stamp counter value after bootstrapping is complete to determine an order of occurrence of the error relative to the warm reset.
-
公开(公告)号:US20180349231A1
公开(公告)日:2018-12-06
申请号:US15610067
申请日:2017-05-31
Applicant: Intel Corporation
Inventor: Subhankar Panda , Sarathy Jayakumar , Gaurav Porwal , Theodros Yigzaw
IPC: G06F11/14
CPC classification number: G06F11/0793 , G06F11/0772 , G06F11/0796 , G06F11/1415 , G06F11/142 , G06F11/1441
Abstract: A computing apparatus, including: a hardware platform including a processor and memory; and a system management interrupt (SMI) handler; first logic configured to provide a first container and a second container via the hardware platform; and second logic configured to: detect an uncorrectable error in the first container; responsive to the detecting, generate a degraded system state; provide a degraded state message to the SMI handler; instruct the second container to seek a recoverable state; determine that the second container has entered a recoverable state; and initiate a recovery operation.
-
公开(公告)号:US10929232B2
公开(公告)日:2021-02-23
申请号:US15610067
申请日:2017-05-31
Applicant: Intel Corporation
Inventor: Subhankar Panda , Sarathy Jayakumar , Gaurav Porwal , Theodros Yigzaw
Abstract: A computing apparatus, including: a hardware platform including a processor and memory; and a system management interrupt (SMI) handler; first logic configured to provide a first container and a second container via the hardware platform; and second logic configured to: detect an uncorrectable error in the first container; responsive to the detecting, generate a degraded system state; provide a degraded state message to the SMI handler; instruct the second container to seek a recoverable state; determine that the second container has entered a recoverable state; and initiate a recovery operation.
-
公开(公告)号:US10824496B2
公开(公告)日:2020-11-03
申请号:US15857376
申请日:2017-12-28
Applicant: Intel Corporation
Inventor: Subhankar Panda , Gaurav Porwal , John G. Holm
Abstract: An apparatus and method for machine check bank reporting in a processor. For example, one embodiment includes a processor comprising: one or more cores to execute instructions and process data; a plurality of machine check architecture banks to store errors detected during execution of the instructions; error monitoring circuitry to detect the errors and responsively update the MCA banks; and a first error register (FERR) into which a first error vector is to be stored to identify an MCA bank containing a first error in an error sequence, the error monitoring circuitry to update the first error vector responsive to detecting the first error; and one or more next error registers (NERRs) to store one or more error vectors to one or more other MCA banks containing subsequent errors occurring after the first error.
-
公开(公告)号:US10671465B2
公开(公告)日:2020-06-02
申请号:US15362522
申请日:2016-11-28
Applicant: Intel Corporation
Inventor: Gaurav Porwal , Subhankar Panda , John G. Holm
IPC: G06F11/07
Abstract: Upon occurrence of multiple errors in a central processing unit (CPU) package, data indicating the errors is stored in machine check (MC) banks. A timestamp corresponding to each error is stored, the timestamp indicating a time of occurrence for each error. A machine check exception (MCE) handler is generated to address the errors based on the timestamps. The timestamps can be stored in the MC banks or in a utility box (U-box). The MCE handler can then address the errors based on order of occurrence, for example by determining that the first error in time causes the remaining error. The MCE can isolate hardware/software associated with the first error to recover from a failure. The MCE can report only the first error to the operating system (OS) or other error management software/hardware. The U-Box may also convert the timestamps into real time to support user debugging.
-
-
-
-
-
-
-
-
-