Abstract:
Technology for operating a cache sizing system is disclosed. In various embodiments, the technology monitors input/output (IO) accesses to a storage system within a monitor period; tracks an access map for storage addresses within the storage system during the monitor period; and counts a particular access condition of the IO accesses based on the access map during the monitor period. When sizing a cache of the storage system that enables the storage system to provide a specified level of service, the counting is for computing a working set size (WSS) estimate of the storage system.
Abstract:
Technology is described for actively responding to data storage traffic. The technology can provide an application program interface; receive, via the application program interface, from an application, a command to query a data storage attribute associated with a virtual data storage component; query the associated virtual data storage component; and return to the application a value for the data storage attribute.
Abstract:
Graph transformations are used by a data management system to correct violations of service-level objectives (SLOs) in a data center. In one aspect, a process is provided to manage a data center by receiving an indication of a violation of a service-level objective associated with the data center from a server in the data center. A graph representation and a transformations data container are retrieved by the data management system from data storage accessible to the data management system. The transformations data container includes one or more transformations. The transformation is processed to create a mutated graph from a data center representation from the graph representation. An option for managing the data center is determined as a result of evaluating the mutated graphs.
Abstract:
It is detected that a metric associated with a first workload has breached a first threshold. It is determined that the first workload and a second workload access the same storage resources, wherein the storage resources are associated with a storage server. It is determined that the metric is impacted by the first workload and the second workload accessing the same storage resources. A candidate solution is identifier. An estimated impact of a residual workload is determined based, at least in part, on the candidate solution. A level of caching of at least one of the first workload or the second workload is adjusted based, at least in part, on the estimated impact of the residual workload.
Abstract:
It is detected that a metric associated with a first workload has breached a first threshold. It is determined that the first workload and a second workload access the same storage resources, wherein the storage resources are associated with a storage server. It is determined that the metric is impacted by the first workload and the second workload accessing the same storage resources. A candidate solution is identifier. An estimated impact of a residual workload is determined based, at least in part, on the candidate solution. A level of caching of at least one of the first workload or the second workload is adjusted based, at least in part, on the estimated impact of the residual workload.
Abstract:
Technology is described for a profile-based lifecycle management for data storage servers. The technology can receive a profile, monitor events emitted by devices of the data storage system, determine based on the monitored events that a device of the storage system matches the indicated condition, and perform the action corresponding to the indicated condition, wherein the action includes managing data stored by the data storage system. The received profile can indicate a condition and an action corresponding to the condition.
Abstract:
It is detected that a metric associated with a first workload has breached a first threshold. It is determined that the first workload and a second workload access the same storage resources, wherein the storage resources are associated with a storage server. It is determined that the metric is impacted by the first workload and the second workload accessing the same storage resources. A candidate solution is identifier. An estimated impact of a residual workload is determined based, at least in part, on the candidate solution. A level of caching of at least one of the first workload or the second workload is adjusted based, at least in part, on the estimated impact of the residual workload.
Abstract:
Methods, systems, and computer executable instructions for performing distributed data analytics are provided. In one exemplary embodiment, a method of performing a distributed data analytics job includes collecting application-specific information in a processing node assigned to perform a task to identify data necessary to perform the task. The method also includes requesting a chunk of the necessary data from a storage server based on location information indicating one or more locations of the data chunk and prioritizing the request relative to other data requests associated with the job. The method also includes receiving the data chunk from the storage server in response to the request and storing the data chunk in a memory cache of the processing node which uses a same file system as the storage server.
Abstract:
Systems, devices, and methods are described for performing content-aware task assignment. A resource manager in a distributed computing system can identify tasks associated with a file. Each task can involve processing multiple data blocks of the file (e.g., in parallel with other processing by other tasks). The resource manager can provide block identifiers for the blocks to each of multiple computing nodes. Each computing node can store a respective subset of the blocks in a respective cache storage medium. Each subset of blocks stored at a node can be identified from the block identifiers. The resource manager can assign the task to a selected one of the computing nodes. The task can be assigned based on the selected computing node having larger subset of the blocks than one or more other computing nodes in the distributed computing system. In some embodiments, computing nodes can de-duplicate cached data using block identifiers.
Abstract:
It is detected that a metric associated with a first workload has breached a first threshold. It is determined that the first workload and a second workload access the same storage resources, wherein the storage resources are associated with a storage server. It is determined that the metric is impacted by the first workload and the second workload accessing the same storage resources. A candidate solution is identifier. An estimated impact of a residual workload is determined based, at least in part, on the candidate solution. A level of caching of at least one of the first workload or the second workload is adjusted based, at least in part, on the estimated impact of the residual workload.