Abstract:
A data manager may include a data opaque interface configured to provide, to an arbitrarily selected page-oriented access method, interface access to page data storage that includes latch-free access to the page data storage. In another aspect, a swap operation may be initiated, of a portion of a first page in cache layer storage to a location in secondary storage, based on initiating a prepending of a partial swap delta record to a page state associated with the first page, the partial swap delta record including a main memory address indicating a storage location of a flush delta record that indicates a location in secondary storage of a missing part of the first page. In another aspect, a page manager may initiate a flush operation of a first page in cache layer storage to a location in secondary storage, based on atomic operations with flush delta records.
Abstract:
A system for commoditizing data center networking is disclosed. The system includes an interconnection topology for a data center having a plurality of servers and a plurality of nodes of a network in the data center through which data packets may be routed. The system uses a routing scheme where the routing is oblivious to the traffic pattern between nodes in the network, and wherein the interconnection topology contains a plurality of paths between one or more servers. The multipath routing may be Valiant load balancing. It disaggregates the function of load balancing into a group of regular servers, with the result that load balancing server hardware can be distributed amongst racks in the data center leading to greater agility and less fragmentation. The architecture creates a huge, flexible switching domain, supporting any server/any service, full mesh agility, and unregimented server capacity at low cost.
Abstract:
This patent application relates to an agile network architecture that can be employed in data centers, among others. One implementation provides a virtual layer-2 network connecting machines of a layer-3 infrastructure.
Abstract:
The subject disclosure is directed towards using primary data deduplication concepts for more efficient access of data via content addressable caches. Chunks of data, such as deduplicated data chunks, are maintained in a fast access client-side cache, such as containing chunks based upon access patterns. The chunked content is content addressable via a hash or other unique identifier of that content in the system. When a chunk is needed, the client-side cache (or caches) is checked for the chunk before going to a file server for the chunk. The file server may likewise maintain content addressable (chunk) caches. Also described are cache maintenance, management and organization, including pre-populating caches with chunks, as well as using RAM and/or solid-state storage device caches.
Abstract:
Memory is shared among physically distinct, networked computing devices. Each computing device comprises a Remote Memory Interface (RMI) accepting commands from locally executing processes and translating such commands into forms transmittable to a remote computing device. The RMI also accepts remote communications directed to it and translates those into commands directed to local memory. The amount of storage capacity shared is informed by a centralized controller, either a single controller, a hierarchical collection of controllers, or a peer-to-peer negotiation. Requests that are directed to remote high-speed non-volatile storage media are detected or flagged and the process generating the request is suspended such that it can be efficiently revived. The storage capacity provided by remote memory is mapped into the process space of processes executing locally.
Abstract:
The subject disclosure is directed towards a data deduplication technology in which a hash index service's index is partitioned into subspace indexes, with less than the entire hash index service's index cached to save memory. The subspace index is accessed to determine whether a data chunk already exists or needs to be indexed and stored. The index may be divided into subspaces based on criteria associated with the data to index, such as file type, data type, time of last usage, and so on. Also described is subspace reconciliation in which duplicate entries in subspaces are detected so as to remove entries and chunks from the deduplication system. Subspace reconciliation may be performed at off-peak time when more system resources are available, and may be interrupted if resources are needed. Subspaces to reconcile may be based on similarity, including via similarity of signatures that each compactly represents the subspace's hashes.
Abstract:
Systems and methods that distribute load balancing functionalities in a data center. A network of demultiplexers and load balancer servers enable a calculated scaling and growth operation, wherein capacity of load balancing operation can be adjusted by changing the number of load balancer servers. Accordingly, load balancing functionality/design can be disaggregated to increase resilience and flexibility for both the load balancing and switching mechanisms of the data center.
Abstract:
The subject disclosure is directed towards predicting compressibility of a data block, and using the predicted compressibility in determining whether a data block if compressed will be sufficiently compressible to justify compression. In one aspect, data of the data block is processed to obtain an entropy estimate of the data block, e.g., based upon distinct value estimation. The compressibility prediction may be used in conjunction with a chunking mechanism of a data deduplication system.
Abstract:
The subject disclosure is directed towards a data deduplication technology in which a hash index service's index and/or indexing operations are adaptable to balance deduplication performance savings, throughput and resource consumption. The indexing service may employ hierarchical chunking using different levels of granularity corresponding to chunk size, a sampled compact index table that contains compact signatures for less than all of the hash index's (or subspace's) hash values, and/or selective subspace indexing based on similarity of a subspace's data to another subspace's data and/or to incoming data chunks.
Abstract:
The subject disclosure is directed towards a data deduplication technology in which a hash index service's index maintains a hash index in a secondary storage device such as a hard drive, along with a compact index table and look-ahead cache in RAM that operate to reduce the I/O to access the secondary storage device during deduplication operations. Also described is a session cache for maintaining data during a deduplication session, and encoding of a read-only compact index table for efficiency.