Abstract:
An apparatus and method for supporting communication during error handling in a computing system. A computing system includes a first partition and a second partition, each capable of performing error management based on a respective machine check architecture (MCA). When a host processor in the first partition detects an error that requires information from processor cores of the second partition, the host processor generates an access request with a target address pointing to a storage location in a memory of the second partition, not the first partition. When the host processor receives the requested error log information from the second partition, the host processor completes processing of the error. To support the host processor in generating the target address for the access request, during an earlier bootup operation, the second partition communicates the hardware topology of the second partition to the host processor.
Abstract:
Aspects described herein may relate to a transaction exchange platform using a streaming data platform (SDP) and microservices to process transactions according to review and approval workflows. The transaction exchange platform may receive transactions from origination sources, which may be added to the SDP as transaction objects. As the transactions are received, the transactions may be analyzed to detect duplicate transactions and/or errors in the transactions. The transaction exchange platform may take steps to remediate transactions that are recognized as duplicates or predicted to generate one or more errors. Similarly, the transaction exchange platform may take steps to remediate transactions that are rejected by a clearinghouse.
Abstract:
Techniques are disclosed relating to automated operations management. In various embodiments, a computer system accesses operational information that defines commands for an operational scenario and accesses blueprints that describe operational entities in a target computer environment related to the operational scenario. The computer system implements the operational scenario for the target computer environment. The implementing may include executing a hierarchy of controller modules that include an orchestrator controller module at top level of the hierarchy that is executable to carry out the commands by issuing instructions to controller modules at a next level. The controller modules may be executable to manage the operational entities according to the blueprints to complete the operational scenario. In various embodiments, the computer system includes additional features such as an application programming interface (API), a remote routing engine, a workflow engine, a reasoning engine, a security engine, and a testing engine.
Abstract:
In an example embodiment, information about one or more failed payment attempts via an electronic payment processing system is obtained. One or more features are extracted from the information. Then, for each of a plurality of potential candidate retry time points, the one or more features and the potential candidate retry time point are fed into a dunning model, the dunning model trained via a machine-learning algorithm to produce a dunning score indicative of a likelihood that a retry attempt at an input retry time point will result in a successful payment processing. The dunning scores for the plurality of potential candidate retry time points are used to select a desired retry time point. Then the electronic payment processing system is caused to attempt to reprocess a payment associated with one of the failed payment attempts at a time matching the desired retry time point.
Abstract:
Techniques are disclosed relating to automated operations management. In various embodiments, a computer system accesses operational information that defines commands for an operational scenario and accesses blueprints that describe operational entities in a target computer environment related to the operational scenario. The computer system implements the operational scenario for the target computer environment. The implementing may include executing a hierarchy of controller modules that include an orchestrator controller module at top level of the hierarchy that is executable to carry out the commands by issuing instructions to controller modules at a next level. The controller modules may be executable to manage the operational entities according to the blueprints to complete the operational scenario. In various embodiments, the computer system includes additional features such as an application programming interface (API), a remote routing engine, a workflow engine, a reasoning engine, a security engine, and a testing engine.
Abstract:
A memory controller includes a request pipeline and a retry control circuit. The request pipeline receives an input of a request to a memory output from a processor core, stores the request, and causes the memory to process the request in order of storage. The retry control circuit stops a new request input to the request pipeline when an error occurs in the memory, and re-inputs, to the request pipeline, requests to be retried that includes the request in which the error has occurred and a subsequent request stored in the request pipeline.
Abstract:
A resiliency system detects and corrects memory errors reported by a memory system of a computing system using previously stored error correction information. When a program stores data into a memory location, the resiliency system executing on the computing system generates and stores error correction information. When the program then executes a load instruction to retrieve the data from the memory location, the load instruction completes normally if there is no memory error. If, however, there is a memory error, the computing system passes control to the resiliency system (e.g., via a trap) to handle the memory error. The resiliency system retrieves the error correction information for the memory location and re-creates the data of the memory location. The resiliency system stores the data as if the load instruction had completed normally and passes control to the next instruction of the program.
Abstract:
Conventional semiconductor devices are problematic in that an operation cannot be continued in the event of a failure of one of CPU cores performing a lock step operation and, as a result, reliability cannot be improved. The semiconductor device according to the present invention includes a computing unit including a first CPU core and a second CPU core that perform a lock step operation, wherein the first CPU core and the second CPU core respectively diagnose failures of internal logic circuits, and a sequence control circuit switches the CPU core that outputs data to a shared resource, in the computing unit based on the diagnose result.
Abstract:
A resiliency system detects and corrects memory errors reported by a memory system of a computing system using previously stored error correction information. When a program stores data into a memory location, the resiliency system executing on the computing system generates and stores error correction information. When the program then executes a load instruction to retrieve the data from the memory location, the load instruction completes normally if there is no memory error. If, however, there is a memory error, the computing system passes control to the resiliency system (e.g., via a trap) to handle the memory error. The resiliency system retrieves the error correction information for the memory location and re-creates the data of the memory location. The resiliency system stores the data as if the load instruction had completed normally and passes control to the next instruction of the program.
Abstract:
State recovery methods and apparatus for computing platforms are disclosed. An example method includes inserting, with a processor, a first instruction into optimized code to cause a first portion of a register in a first state to be saved to memory before execution of a region of the optimized code, maintaining, with the processor, a first indication of a first manner in which the first portion of the register is to be restored in connection with a state recovery after execution of the region of the optimized code, and maintaining, with the processor, a second indication of a second manner in which a second portion of the register is to be restored in connection with the state recovery after execution of the region of the optimized code.