Scalable machine check architecture

    公开(公告)号:US12072756B2

    公开(公告)日:2024-08-27

    申请号:US17854710

    申请日:2022-06-30

    CPC classification number: G06F11/0772 G06F11/0787 G06F11/1405 G06F12/0292

    Abstract: An apparatus and method for supporting communication during error handling in a computing system. A computing system includes a first partition and a second partition, each capable of performing error management based on a respective machine check architecture (MCA). When a host processor in the first partition detects an error that requires information from processor cores of the second partition, the host processor generates an access request with a target address pointing to a storage location in a memory of the second partition, not the first partition. When the host processor receives the requested error log information from the second partition, the host processor completes processing of the error. To support the host processor in generating the target address for the access request, during an earlier bootup operation, the second partition communicates the hardware topology of the second partition to the host processor.

    Application programming interface for automated operations management

    公开(公告)号:US12001284B2

    公开(公告)日:2024-06-04

    申请号:US16702075

    申请日:2019-12-03

    Inventor: Mark F. Wilding

    Abstract: Techniques are disclosed relating to automated operations management. In various embodiments, a computer system accesses operational information that defines commands for an operational scenario and accesses blueprints that describe operational entities in a target computer environment related to the operational scenario. The computer system implements the operational scenario for the target computer environment. The implementing may include executing a hierarchy of controller modules that include an orchestrator controller module at top level of the hierarchy that is executable to carry out the commands by issuing instructions to controller modules at a next level. The controller modules may be executable to manage the operational entities according to the blueprints to complete the operational scenario. In various embodiments, the computer system includes additional features such as an application programming interface (API), a remote routing engine, a workflow engine, a reasoning engine, a security engine, and a testing engine.

    Optimized dunning using machine-learned model

    公开(公告)号:US11915247B2

    公开(公告)日:2024-02-27

    申请号:US17888297

    申请日:2022-08-15

    Applicant: Stripe, Inc.

    CPC classification number: G06Q20/425 G06F11/1405 G06N20/00

    Abstract: In an example embodiment, information about one or more failed payment attempts via an electronic payment processing system is obtained. One or more features are extracted from the information. Then, for each of a plurality of potential candidate retry time points, the one or more features and the potential candidate retry time point are fed into a dunning model, the dunning model trained via a machine-learning algorithm to produce a dunning score indicative of a likelihood that a retry attempt at an input retry time point will result in a successful payment processing. The dunning scores for the plurality of potential candidate retry time points are used to select a desired retry time point. Then the electronic payment processing system is caused to attempt to reprocess a payment associated with one of the failed payment attempts at a time matching the desired retry time point.

    WORKFLOWS FOR AUTOMATED OPERATIONS MANAGEMENT

    公开(公告)号:US20240045764A1

    公开(公告)日:2024-02-08

    申请号:US18483340

    申请日:2023-10-09

    Inventor: Mark F. Wilding

    Abstract: Techniques are disclosed relating to automated operations management. In various embodiments, a computer system accesses operational information that defines commands for an operational scenario and accesses blueprints that describe operational entities in a target computer environment related to the operational scenario. The computer system implements the operational scenario for the target computer environment. The implementing may include executing a hierarchy of controller modules that include an orchestrator controller module at top level of the hierarchy that is executable to carry out the commands by issuing instructions to controller modules at a next level. The controller modules may be executable to manage the operational entities according to the blueprints to complete the operational scenario. In various embodiments, the computer system includes additional features such as an application programming interface (API), a remote routing engine, a workflow engine, a reasoning engine, a security engine, and a testing engine.

    RESILIENCY TO MEMORY FAILURES IN COMPUTER SYSTEMS
    49.
    发明申请
    RESILIENCY TO MEMORY FAILURES IN COMPUTER SYSTEMS 有权
    计算机系统中存储器故障的恢复

    公开(公告)号:US20170068596A1

    公开(公告)日:2017-03-09

    申请号:US15357448

    申请日:2016-11-21

    Applicant: Cray Inc.

    Abstract: A resiliency system detects and corrects memory errors reported by a memory system of a computing system using previously stored error correction information. When a program stores data into a memory location, the resiliency system executing on the computing system generates and stores error correction information. When the program then executes a load instruction to retrieve the data from the memory location, the load instruction completes normally if there is no memory error. If, however, there is a memory error, the computing system passes control to the resiliency system (e.g., via a trap) to handle the memory error. The resiliency system retrieves the error correction information for the memory location and re-creates the data of the memory location. The resiliency system stores the data as if the load instruction had completed normally and passes control to the next instruction of the program.

    Abstract translation: 弹性系统使用先前存储的纠错信息来检测和校正由计算系统的存储器系统报告的存储器错误。 当程序将数据存储到存储器位置时,在计算系统上执行的弹性系统生成并存储纠错信息。 当程序然后执行加载指令以从存储器位置检索数据时,如果没有存储器错误,则加载指令正常完成。 然而,如果存在内存错误,则计算系统将控制权传给弹性系统(例如,经由陷阱)来处理存储器错误。 弹性系统检索存储器位置的纠错信息并重新创建存储器位置的数据。 弹性系统存储数据,就好像加载指令已经正常完成,并将控制权传给程序的下一条指令。

    STATE RECOVERY METHODS AND APPARATUS FOR COMPUTING PLATFORMS
    50.
    发明申请
    STATE RECOVERY METHODS AND APPARATUS FOR COMPUTING PLATFORMS 审中-公开
    用于计算平台的状态恢复方法和装置

    公开(公告)号:US20170046140A1

    公开(公告)日:2017-02-16

    申请号:US15335709

    申请日:2016-10-27

    CPC classification number: G06F8/443 G06F9/45516 G06F11/1405 G06F2201/805

    Abstract: State recovery methods and apparatus for computing platforms are disclosed. An example method includes inserting, with a processor, a first instruction into optimized code to cause a first portion of a register in a first state to be saved to memory before execution of a region of the optimized code, maintaining, with the processor, a first indication of a first manner in which the first portion of the register is to be restored in connection with a state recovery after execution of the region of the optimized code, and maintaining, with the processor, a second indication of a second manner in which a second portion of the register is to be restored in connection with the state recovery after execution of the region of the optimized code.

    Abstract translation: 披露了用于计算平台的状态恢复方法和装置。 一个示例性方法包括:利用处理器将第一指令插入到优化的代码中,以使得在执行优化代码的区域之前将处于第一状态的寄存器的第一部分保存到存储器, 与执行优化代码的区域之后的状态恢复相关联地恢复寄存器的第一部分的第一方式的第一指示,并且用处理器维持第二方式的第二指示,其中第二指示 寄存器的第二部分将在执行优化代码的区域之后与状态恢复相关联地恢复。

Patent Agency Ranking