Abstract:
Technology is disclosed for managing data in a distributed processing system ("the technology"). In various embodiments, the technology pushes "cold" data from a primary storage of the distributed processing system to a backup storage thereby maximizing the usage of the space on the primary storage to store "hot" data on which most data processing activities are performed in the distributed processing system. The cold data is retrieved from the backup storage into the primary storage on demand, for example, upon receiving an access request from a client. While the primary storage stores the data in a format specific to the distributed processing system, the backup storage stores the data in a different format, for example, format corresponding to the type of backup storage.
Abstract:
Described herein is a system and method for a scalable crash-consistent snapshot operation. Write requests may be received from an application and a snapshot creation request may further be received. Write requests received before the snapshot creation request may be associated with pre-snapshot tags and write requests received after the snapshot creation request may be associated with post-snapshot tags. Furthermore, in response to the snapshot creation request, logical interfaces may begin to be switched from a pre-snapshot configuration to a post-snapshot configuration. The snapshot may then be created based on the pre-snapshot write requests and the post-snapshot write requests may be suspended until the logical interfaces have switched configuration.
Abstract:
Methods and apparatuses for efficiently migrating deduplicated data are provided. In one example, a data management system includes a data storage volume, a memory including machine executable instructions, and a computer processor. The data storage volume includes data objects and free storage space. The computer processor executes the instructions to perform deduplication of the data objects and determine migration efficiency metrics for groups of the data objects. Determining the migration efficiency metrics includes determining, for each group, a relationship between the free storage space that will result if the group is migrated from the volume and the resources required to migrate the group from the volume.
Abstract:
The techniques introduced herein provide for systems and methods for creating and managing a contention-free multi-path access to a distributed data set in a distributed processing system. In one embodiment, a distributed processing system comprises a plurality of compute nodes. The compute nodes are assembled into compute groups and configured such that each compute group has an attached or local storage system. Various data segments of the distributed data set are stored in data storage objects on the local storage system. The data storage objects are cross-mapped into each of the compute nodes in the compute group so that any compute node in the group can access any of the data segments stored in the local storage system via the respective data storage object.
Abstract:
Described herein is a system and method for a scalable crash-consistent snapshot operation. Write requests may be received from an application and a snapshot creation request may further be received. Write requests received before the snapshot creation request may be associated with pre-snapshot tags and write requests received after the snapshot creation request may be associated with post-snapshot tags. Furthermore, in response to the snapshot creation request, logical interfaces may begin to be switched from a pre-snapshot configuration to a post-snapshot configuration. The snapshot may then be created based on the pre-snapshot write requests and the post-snapshot write requests may be suspended until the logical interfaces have switched configuration.