Method and system for data transfer between compute clusters and file system

    公开(公告)号:US09628299B2

    公开(公告)日:2017-04-18

    申请号:US14045170

    申请日:2013-10-03

    CPC classification number: G06F17/303 G06F17/30194 H04L12/6418 H04L67/06

    Abstract: A data migrating system and method are provided in which a Burst Buffer Network Aggregator (BBNA) process is configured either on the File Servers or on the File System's dedicated I/O nodes to coalesce data fragments stored in participating Burst Buffer nodes under the direction of a primary BB node appointed by a data generating entity prior to transfer of the full data stripe into the File System. The “write” request in the form of a full data stripe is distributed into a plurality of data fragments among participating BB nodes along with corresponding metadata. The primary BB node gathers the metadata from the participating BB nodes, sends the metadata list to the BBNA unit, responsive to which the BBNA unit allocates a buffer sufficient to store the full data stripe, and transfers data fragments from participating BB nodes into the full data stripe buffer, thereby coalescing the data fragments into the full data stripe, which is subsequently transferred from the buffer in the BBNA unit into the File System.

    System and method for scale-out node-local data caching using network-attached non-volatile memories

    公开(公告)号:US09900397B1

    公开(公告)日:2018-02-20

    申请号:US15016774

    申请日:2016-02-05

    Abstract: The system and routine for data caching leverages the properties of Network-Attached Non-Volatile Memories (NANVMs) to provide virtualized secure node-local storage services to the network users with reduced data movement across the NANVMs. The caching routine reserves storage resources (storage partitions) on NANVM devices, migrates data required for the target application execution to the allocated storage partitions, and directs the network clients to dynamically “mount” to the storage partitions based on application data requirements. Only those clients and applications that present valid credentials and satisfactory computing capabilities can access the data in the specific storage partitions. Several clients can have an access to the same storage partitions without duplication or replicating the data. A Global Data Indexing sub-system supports the efficient operation of the subject system. The Global Data Indexing Sub-System provides mapping between the storage partitions, data sets, applications, client nodes, as well as their credentials/capabilities.

    Method and system for reclamation of distributed dynamically generated erasure groups for data migration between high performance computing architectures and data storage using non-deterministic data addressing
    5.
    发明授权
    Method and system for reclamation of distributed dynamically generated erasure groups for data migration between high performance computing architectures and data storage using non-deterministic data addressing 有权
    用于回收分布式动态生成的擦除组的方法和系统,用于使用非确定性数据寻址在高性能计算架构和数据存储之间进行数据迁移

    公开(公告)号:US09378088B1

    公开(公告)日:2016-06-28

    申请号:US14586346

    申请日:2014-12-30

    Abstract: The present invention is directed to data migration, and particularly, Parity Group migration, between high performance data generating entities and data storage structure in which distributed NVM arrays are used as a single intermediate logical storage which requires a global registry/addressing capability that facilitates the storage and retrieval of the locality information (metadata) for any given fragment of unstructured data and where Parity Group Identifier and Parity Group Information (PGI) descriptors for the Parity Groups' members tracking, are created and distributed in the intermediate distributed NVM arrays as a part of the non-deterministic data addressing system to ensure coherency and fault tolerance for the data and the metadata. The PGI descriptors act as collection points for state describing the residency and replay status of members of the Parity Groups.

    Abstract translation: 本发明涉及数据迁移,特别是在高性能数据生成实体和数据存储结构之间的数据迁移,其中分布式NVM阵列用作需要全局注册/寻址能力的单个中间逻辑存储器的数据存储结构, 存储和检索非结构化数据的任何给定片段的位置信息(元数据),以及奇偶校验组成员跟踪的奇偶校验组标识符和奇偶校验组信息(PGI)描述符被创建并分布在中间分布式NVM阵列中,作为 部分非确定性数据寻址系统,以确保数据和元数据的一致性和容错性。 PGI描述符作为描述奇偶校验组成员的驻留和重播状态的状态的收集点。

    Maintaining order and fault-tolerance in a distributed hash table system
    6.
    发明授权
    Maintaining order and fault-tolerance in a distributed hash table system 有权
    维护分布式哈希表系统中的顺序和容错

    公开(公告)号:US09152649B2

    公开(公告)日:2015-10-06

    申请号:US14050156

    申请日:2013-10-09

    CPC classification number: G06F17/30227 G06F17/30094

    Abstract: Data storage systems and methods for storing data are described herein. The storage system includes a first storage node is configured to issue a first delivery request to a first set of other storage nodes in the storage system, the first delivery request including a first at least one data operation for each of the first set of other storage nodes and issuing at least one other delivery request, while the first delivery request remains outstanding, the at least one other delivery request including a first commit request for each of the first set of other storage nodes. The first node causes the first at least one data operation to be made active within the storage system in response to receipt of a commit indicator along with a delivery acknowledgement regarding one of the at least one other delivery request.

    Abstract translation: 本文描述了用于存储数据的数据存储系统和方法。 存储系统包括第一存储节点,其被配置为向存储系统中的第一组其他存储节点发出第一传送请求,第一传送请求包括第一组其他存储器中的每一个的第一至少一个数据操作 节点并且发出至少一个其他递送请求,而所述第一递送请求保持未决,所述至少一个其他递送请求包括针对所述第一组其他存储节点中的每一个的第一提交请求。 第一节点使得响应于接收到提交指示符以及关于至少一个其他传送请求之一的传送确认,使存储系统内的第一至少一个数据操作被激活。

    Method for data transfer between compute clusters and file system

    公开(公告)号:US10042869B1

    公开(公告)日:2018-08-07

    申请号:US15016562

    申请日:2016-02-05

    Abstract: A data migrating system and method are provided in which a Burst Buffer Network Aggregator (BBNA) process is configured either on the File Servers or on the File System's dedicated I/O nodes to coalesce data fragments stored in participating Burst Buffer nodes under the direction of a primary BB node appointed by a data generating entity prior to transfer of the full data stripe into the File System. The “write” request in the form of a full data stripe is distributed into a plurality of data fragments among participating BB nodes along with corresponding metadata. The primary BB node gathers the metadata from the participating BB nodes, sends the metadata list to the BBNA unit, responsive to which the BBNA unit allocates a buffer sufficient to store the full data stripe, and transfers data fragments from participating BB nodes into the full data stripe buffer, thereby coalescing the data fragments into the full data stripe, which is subsequently transferred from the buffer in the BBNA unit into the File System.

    Data storage system with active power management and method for monitoring and dynamical control of power sharing between devices in data storage system
    8.
    发明授权
    Data storage system with active power management and method for monitoring and dynamical control of power sharing between devices in data storage system 有权
    具有有源电源管理的数据存储系统和数据存储系统中设备之间功率共享的监控和动态控制方法

    公开(公告)号:US09477279B1

    公开(公告)日:2016-10-25

    申请号:US14293047

    申请日:2014-06-02

    CPC classification number: G06F11/3062 G06F1/206 G06F1/3206 Y02D10/16

    Abstract: A data storage system is implemented with an active power monitoring and control performed by a control node elected among a number of nodes. A real-time power monitoring information is supplied to the control node from, a power monitoring logic residing at each device in the system. The devices in the data storage system are pre-allocated with respective individual power budgets which are below the maximum power usage thereof. The power budgets of all the equipment cumulatively constitute a power budget assigned to the group of equipment. The control node controls dynamically and in real time power sharing between the plurality devices so that the devices with required power usage below the pre-allocated power budget can share their extra power credits with devices which are in need for extra power for performing its operation. The control node provides sharing of the power among the equipment in the data storage system with a goal of avoiding exceeding of the cumulative power budget assigned for the entire system or the cluster of the equipment.

    Abstract translation: 数据存储系统由在多个节点之间选择的控制节点执行的有功功率监视和控制来实现。 实时电力监控信息从位于系统中每个设备处的电力监控逻辑提供给控制节点。 预先分配数据存储系统中的设备,其各自的功率预算低于其最大功率使用量。 所有设备的功率预算累计构成分配给该组设备的功率预算。 控制节点在多个设备之间动态地和实时地进行功率共享控制,使得具有低于预分配功率预算的所需功率使用的设备可以与需要额外功率的设备共享其额外的功率信用以执行其操作。 控制节点提供数据存储系统中的设备之间的电力共享,其目标是避免超过分配给整个系统或设备集群的累积功率预算。

    REDUCING METADATA IN A WRITE-ANYWHERE STORAGE SYSTEM
    9.
    发明申请
    REDUCING METADATA IN A WRITE-ANYWHERE STORAGE SYSTEM 有权
    在写入任何存储系统中减少元数据

    公开(公告)号:US20140108723A1

    公开(公告)日:2014-04-17

    申请号:US14056265

    申请日:2013-10-17

    Abstract: Systems and methods for reducing metadata in a write-anywhere storage system are disclosed herein. The system includes a plurality of clients coupled with a plurality of storage nodes, each storage node having a plurality of primary storage devices coupled thereto. A memory management unit including cache memory is included in the client. The memory management unit serves as a cache for data produced by the clients before the data is stored in the primary storage. The cache includes an extent cache, an extent index, a commit cache and a commit index. The movement of data and metadata is by an interval tree. Methods for reducing data in the interval tree increase data storage and data retrieval performance of the system.

    Abstract translation: 这里公开了用于减少写入任何位置的存储系统中的元数据的系统和方法。 该系统包括与多个存储节点耦合的多个客户端,每个存储节点具有耦合到其上的多个主存储设备。 包括缓存存储器的存储器管理单元包括在客户机中。 在将数据存储在主存储器中之前,存储器管理单元用作由客户端生成的数据的高速缓存。 缓存包括扩展缓存,扩展名索引,提交高速缓存和提交索引。 数据和元数据的移动是间隔树。 减少间隔树中数据的方法增加了系统的数据存储和数据检索性能。

    Method and system for data migration between high performance computing architectures and file system using distributed parity group information structures with non-deterministic data addressing
    10.
    发明授权
    Method and system for data migration between high performance computing architectures and file system using distributed parity group information structures with non-deterministic data addressing 有权
    使用具有非确定性数据寻址的分布式奇偶校验组信息结构在高性能计算架构和文件系统之间进行数据迁移的方法和系统

    公开(公告)号:US09477551B1

    公开(公告)日:2016-10-25

    申请号:US14556571

    申请日:2014-12-01

    Abstract: The present invention is directed to data migration, and particularly, Parity Group migration, between high performance data generating entities and data storage structure in which distributed NVM arrays are used as a single intermediate logical storage which requires a global registry/addressing capability that facilitates the storage and retrieval of the locality information (metadata) for any given fragment of unstructured data and where Parity Group Identifier and Parity Group Information (PGI) descriptors for the Parity Groups' members tracking, are created and distributed in the intermediate distributed NVM arrays as a part of the non-deterministic data addressing system to ensure coherency and fault tolerance for the data and the metadata. The PGI descriptors act as collection points for state describing the residency and replay status of members of the Parity Groups.

    Abstract translation: 本发明涉及数据迁移,特别是在高性能数据生成实体和数据存储结构之间的数据迁移,其中分布式NVM阵列用作需要全局注册/寻址能力的单个中间逻辑存储器的数据存储结构, 存储和检索非结构化数据的任何给定片段的位置信息(元数据),以及奇偶校验组成员跟踪的奇偶校验组标识符和奇偶校验组信息(PGI)描述符被创建并分布在中间分布式NVM阵列中,作为 部分非确定性数据寻址系统,以确保数据和元数据的一致性和容错性。 PGI描述符作为描述奇偶校验组成员的驻留和重播状态的状态的收集点。

Patent Agency Ranking