HEALING FAILED ERASURE-CODED WRITE ATTEMPTS IN A DISTRIBUTED DATA STORAGE SYSTEM CONFIGURED WITH FEWER STORAGE NODES THAN DATA PLUS PARITY FRAGMENTS

    公开(公告)号:US20220019355A1

    公开(公告)日:2022-01-20

    申请号:US17336081

    申请日:2021-06-01

    Abstract: A distributed data storage system using erasure coding (EC) provides advantages of EC data storage while retaining high resiliency for EC data storage architectures having fewer data storage nodes than the number of EC data-plus-parity fragments. To ameliorate the effects of certain storage node outages or fatal disk failures, incoming data is temporarily replicated so that read and write operations can continue from/to the storage system. The system automatically heals failed EC write attempts in a manner transparent to users and/or applications: when all storage nodes are operational, the distributed data storage system automatically converts the temporarily replicated data to EC storage and reclaims storage space previously used by the temporarily replicated data. Individual hardware failures are healed through migration techniques that reconstruct and re-fragment data blocks according to the governing EC scheme. An illustrative embodiment is a three-node data storage system using EC 4+2.

    DISTRIBUTED DATA STORAGE SYSTEM USING ERASURE CODING ON STORAGE NODES FEWER THAN DATA PLUS PARITY FRAGMENTS

    公开(公告)号:US20220019372A1

    公开(公告)日:2022-01-20

    申请号:US17336103

    申请日:2021-06-01

    Abstract: A distributed data storage system using erasure coding (EC) provides advantages of EC data storage while retaining high resiliency for EC data storage architectures having fewer data storage nodes than the number of EC data-plus-parity fragments. An illustrative embodiment is a three-node data storage system with EC 4+2. Incoming data is temporarily replicated to ameliorate the effects of certain storage node outages or fatal disk failures, so that read and write operations can continue from/to the storage system. The system is equipped to automatically heal failed EC write attempts in a manner transparent to users and/or applications: when all storage nodes are operational, the distributed data storage system automatically converts the temporarily replicated data to EC storage and reclaims storage space previously used by the temporarily replicated data. Individual hardware failures are healed through migration techniques that reconstruct and re-fragment data blocks according to the governing EC scheme.

    ANTI-ENTROPY-BASED METADATA RECOVERY IN A STRONGLY CONSISTENT DISTRIBUTED DATA STORAGE SYSTEM

    公开(公告)号:US20230418716A1

    公开(公告)日:2023-12-28

    申请号:US18458377

    申请日:2023-08-30

    Abstract: A strongly consistent distributed data storage system comprises an enhanced metadata service that is capable of fully recovering all metadata that goes missing when a metadata-carrying disk, disks, and/or partition fail. An illustrative recovery service runs automatically or on demand to bring the metadata node back into full service. Advantages of the recovery service include guaranteed full recovery of all missing metadata, including metadata still residing in commit logs, without impacting strong consistency guarantees of the metadata. The recovery service is network-traffic efficient. In preferred embodiments, the recovery service avoids metadata service downtime at the metadata node, thereby reducing the impact of metadata disk failure on the availability of the system. The disclosed metadata recovery techniques are said to be “self-healing” as they do not need manual intervention and instead automatically detect failures and automatically recover from the failures in a non-disruptive manner.

    ANTI-ENTROPY-BASED METADATA RECOVERY IN A STRONGLY CONSISTENT DISTRIBUTED DATA STORAGE SYSTEM

    公开(公告)号:US20220100618A1

    公开(公告)日:2022-03-31

    申请号:US17465722

    申请日:2021-09-02

    Abstract: A strongly consistent distributed data storage system comprises an enhanced metadata service that is capable of fully recovering all metadata that goes missing when a metadata-carrying disk, disks, and/or partition fail. An illustrative recovery service runs automatically or on demand to bring the metadata node back into full service. Advantages of the recovery service include guaranteed full recovery of all missing metadata, including metadata still residing in commit logs, without impacting strong consistency guarantees of the metadata. The recovery service is network-traffic efficient. In preferred embodiments, the recovery service avoids metadata service downtime at the metadata node, thereby reducing the impact of metadata disk failure on the availability of the system. The disclosed metadata recovery techniques are said to be “self-healing” as they do not need manual intervention and instead automatically detect failures and automatically recover from the failures in a non-disruptive manner.

Patent Agency Ranking