Abstract:
In one aspect of the present disclosure, there is provided a memory system comprising a memory device configured to temporarily store data therein, the data being loaded thereon for programming a selected page among multiple pages, the memory device further configured to program the selected page using the data; and a controller configured to send the data to the memory device, wherein the controller is further configured to control the memory device such that, in a failure event of the program for the selected page the memory device re-programs another page using the data temporarily stored therein without receipt of further data from the controllers
Abstract:
Exemplary embodiments provide a way to manage data recovery in a distributed system having multiple data store nodes. A storage system comprises: a first node including a first processor; and a plurality of second nodes coupled to the first node, each of the plurality of second nodes including a second processor and one or more second storage devices. The first processor is configured to control to store data and replication of the data in the second storage devices of two or more second nodes. If at least one of the second nodes has failed and a storage capacity of the plurality of second nodes is below a given threshold, one of the second nodes is configured to receive a first data, which is replication of data stored in a failed second node, from another of the second nodes, and create parity data based on the received first data.
Abstract:
A cyclic commit protocol is used to store relationships between transactions and is used by the technology to determine whether a transaction is committed or not. The protocol allows creation of a cycle of transactions which can be used to recover the state of a storage device after a host failure by identifying the last committed version of intention records as committed or uncommitted based on the data stored in the physical pages.
Abstract:
A layout of a transaction log enables efficient logging of metadata into entries of the log, as well as efficient reclamation and recovery of the log entries by a volume layer of a storage input/output (I/O) stack executing on one or more nodes of a cluster. The transaction log is illustratively a two stage, append-only logging structure, wherein the first level is non-volatile random access memory (NVRAM) embodied as a NVlog and the second stage is disk, e.g., solid state drive (SSD). During crash recovery, the log entries are examined for consistency and scanned to identify those entries that have completed and those that are active, which require replay. The log entries are walked from oldest to newest (using sequence numbers) searching for the highest sequence number. Partially complete log entries (e.g., log entries in-progress when a crash occurs) may be discarded for failing a checksum (e.g., a CRC error). Old value/new value logs may be used to implement roll-forward or roll-back semantics to replay the log entries and fix any on-disk data structures, first from NVRAM and then from on-disk logs.
Abstract:
A computer implemented method implemented with a processor for assigning a unique identifier for a data item initially deployed at a node of a networked environment includes determining a unique node identifier for the node of the networked environment, atomically modifying a local counter value at the node of the networked environment, and appending the unique node identifier to the atomically modified local counter value at the node of the networked environment to form a unique ID for the data item.
Abstract:
The disclosed method includes, at a storage controller of a storage system, receiving host instructions to modify configuration settings corresponding to a first memory portion of a plurality of memory portions. The method includes, in response to receiving the host instructions to modify the configuration settings, identifying the first memory portion from the host instructions and modifying the configuration settings corresponding to the first memory portion, in accordance with the host instructions. The method includes, after modifying the configuration settings corresponding to the first memory portion, sending one or more commands to perform memory operations having one or more physical addresses corresponding to the first memory portion and receiving a failure notification indicating failed performance of at least a first memory operation of the one or more memory operations. The method includes, in response to receiving the failure notification, executing one or more error recovery mechanisms.
Abstract:
A method for enhanced restart of a core dumping application is provided. The method includes stopping a plurality of threads in an address space, except for the thread performing the core dump. Computational segments are remapped to client segments. Each open file descriptor in the address space is closed. The application is terminated and the client segments are flushed to external storage.
Abstract:
A method and system for securing continued operation of a primary cloud-based computing environment (CBCE) residing in a first cloud environment are disclosed. The system comprises gathering information respective of the primary CBCE; storing the gathered information in a storage space, wherein the gathered information substantially provides a baseline to initiate the creation of a reconstructed CBCE upon a need to recreate the primary CBCE; updating the gathered information with new information gathered respective of changes to the primary CBCE; receiving a periodic status notification from the primary CBCE; and initiating a reconstruction of the primary CBCE in the second cloud environment responsive to the status notification requesting one of: a reconstruction request and failure of the primary CBCE.
Abstract:
Embodiments include a computer system for temporary pipeline marking for processor error workarounds, the computer system having a processor configured to perform a method. The method includes monitoring a pipeline of the processor for an event that is predetermined to place the processor in a stuck state that results in an errant instruction execution result due to the stuck state or repeated resource contention causing performance degradation. The pipeline is marked for a workaround action based on detecting the event. A clearing action is triggered based on the marking of the pipeline. The marking of the pipeline is cleared based on the triggering of the clearing action.
Abstract:
Some aspects of the disclosure include a self-refresh entry sequence for a memory, such as a DRAM, that may be used to avoid a frequency mismatch between a system processor and a system memory. The self-refresh entry sequence may signal the memory to reset the frequency set point state and default to the power-up state upon a self-refresh process exit. In another aspect, a new mode register may be used to indicate that the frequency set point needs to be reset after the next self-refresh entry command. In this aspect, the processor will execute a mode register write command followed by a self-refresh entry in response to the occurrence of a crash event. Then, the memory will reset to the default frequency set point by the end of self-refresh entry execution.