DETECTING AND RECOVERING FROM FATAL STORAGE ERRORS

    公开(公告)号:US20220083413A1

    公开(公告)日:2022-03-17

    申请号:US17532651

    申请日:2021-11-22

    Abstract: The present disclosure relates to systems, methods, and computer readable media for identifying and responding to a panic condition on a storage system on a computing node. For example, systems disclosed herein may include establishing recovery instructions between a host system and a storage system in responding to a future instance of a panic condition. The storage system may provide an indication of a self-detected panic condition in a variety of ways. In response to identifying the panic condition, the host system may perform one or more recovery actions in accordance with recovery instructions accessible to the host system. This may include performing resets of specific components and reinitializing communication between the host system and storage system in less invasive ways than slower and more expensive conventional approaches for responding to panic conditions on computing nodes.

    TIME-BASED MECHANISM SUPPORTING FLUSH OPERATION

    公开(公告)号:US20190354482A1

    公开(公告)日:2019-11-21

    申请号:US15985156

    申请日:2018-05-21

    Abstract: The techniques disclosed herein improve performance of storage systems by providing a time-based mechanism for supporting a flush operation. In one embodiment, a flush completion time stamp is accessed that is indicative of a most recent time of completion of a cache flush by a cache flush function. The flush completion time stamp is compared with a time stamp associated with a cache flush request. Based on the comparing, an indication is generated that the requested cache flush is complete when the flush completion time stamp is more recent than the time stamp associated with the cache flush request.

    MAINTENANCE MODE FOR STORAGE NODES

    公开(公告)号:US20230089663A1

    公开(公告)日:2023-03-23

    申请号:US17800517

    申请日:2021-03-15

    Abstract: A reduced throughput maintenance mode for adaptively managing input/output (I/O) operations within a resilient group of storage nodes. A first storage node in a resilient group of storage nodes is classified as operating in a normal throughput mode, and a second storage node in the resilient group is classified as operating in a reduced throughput mode. While the second node is classified as operating in the reduced throughput mode, read and write I/O operations are queued for the resilient group. The read I/O operation is prioritized for assignment to the first node, so as to reduce I/O load on the second node while it operates in the reduced throughput mode. The write I/O operation is queued to the second node, so as to maintain synchronization of the second node with the resilient group while it operates in the reduced throughput mode.

    FAULT PREDICTION AND DETECTION USING TIME-BASED DISTRIBUTED DATA

    公开(公告)号:US20200257581A1

    公开(公告)日:2020-08-13

    申请号:US16416107

    申请日:2019-05-17

    Abstract: Performance data is collected for input/output operations executed at a storage device of a plurality of storage devices of a software-defined storage network. Based on the collected performance data, a time-based I/O performance profile for the storage device is determined. A characteristic time-based I/O performance profile is determined for a representative group of storage devices having common characteristics with the storage device and based on previously collected performance data for devices of the representative group. It is determined that the difference between the time-based I/O performance profile for the storage device and the characteristic time-based I/O performance profile exceeds a predetermined deviance threshold that is indicative of a probable failure of the storage device. An indication is generated that the storage device exceeded the predetermined deviance threshold.

    DETECTING AND RECOVERING FROM FATAL STORAGE ERRORS

    公开(公告)号:US20230289249A1

    公开(公告)日:2023-09-14

    申请号:US18319354

    申请日:2023-05-17

    CPC classification number: G06F11/0772 G06F11/0745 G06F11/1441 G06F11/1471

    Abstract: The present disclosure relates to systems, methods, and computer readable media for identifying and responding to a panic condition on a storage system on a computing node. For example, systems disclosed herein may include establishing recovery instructions between a host system and a storage system in responding to a future instance of a panic condition. The storage system may provide an indication of a self-detected panic condition in a variety of ways. In response to identifying the panic condition, the host system may perform one or more recovery actions in accordance with recovery instructions accessible to the host system. This may include performing resets of specific components and reinitializing communication between the host system and storage system in less invasive ways than slower and more expensive conventional approaches for responding to panic conditions on computing nodes.

Patent Agency Ranking