Abstract:
According to one aspect, a distributed database system is configured to manage multi-writer operations on a distributed database by implementing one or more catamorphic database operators. Catamorphic operators can be architected on the system, and executed with little or no reconciliation logic. Catamorphic operators define sets of catamorphic operations and respective execution logic where the order of execution of catamorphic operations is not relevant to a final result.
Abstract:
A durable memory-mapped database system includes a first memory-mapped view of a database, a second memory-mapped view of the database, a journal buffer and a journal. The first memory-mapped view of the database is a protected view and includes copies of a plurality of datafiles from the database. The second memory-mapped view of the database is a write view and includes copies of the plurality of datafiles. The journal buffer is a buffer in random access memory configured to record datafile updates. The journal is configured to periodically receive recorded datafile updates from the journal buffer.
Abstract:
Database systems and methods that implement a data aggregation framework are provided. The framework can be configured to optimize aggregate operations over non-relational distributed databases, including, for example, data access, data retrieval, data writes, indexing, etc. Various embodiments are configured to aggregate multiple operations and/or commands, where the results (e.g., database documents and computations) captured from the distributed database are transformed as they pass through an aggregation operation. The aggregation operation can be defined as a pipeline which enables the results from a first operation to be redirected into the input of a subsequent operation, which output can be redirected into further subsequent operations. Computations may also be executed at each stage of the pipeline, where each result at each stage can be evaluated by the computation to return a result. Execution of the pipeline can be optimized based on data dependencies and re-ordering of the pipeline operations.
Abstract:
Systems and methods are provided to enable control and placement of data repositories. In some embodiments, the system segments data into zones. A website, for example, may need to segment data according to location. In this example, a zone may be created for North America and another zone may be created for Europe. Data related to operations executed in North America, for example, can be placed in the North America zone and data related to transactions in Europe can be placed in the Europe zone. According to some embodiments, the system may use zones to accommodate a range of deployment scenarios.
Abstract:
According to one aspect, a distributed database system is configured to manage multi-writer operations on a distributed database by implementing one or more catamorphic database operators. Catamorphic operators can be architected on the system, and executed with little or no reconciliation logic. Catamorphic operators define sets of catamorphic operations and respective execution logic where the order of execution of catamorphic operations is not relevant to a final result.
Abstract:
Database systems and methods that implement a data aggregation framework are provided. The framework can be configured to optimize aggregate operations over non-relational distributed databases, including, for example, data access, data retrieval, data writes, indexing, etc. Various embodiments are configured to aggregate multiple operations and/or commands, where the results (e.g., database documents and computations) captured from the distributed database are transformed as they pass through an aggregation operation. The aggregation operation can be defined as a pipeline which enables the results from a first operation to be redirected into the input of a subsequent operation, which output can be redirected into further subsequent operations. Computations may also be executed at each stage of the pipeline, where each result at each stage can be evaluated by the computation to return a result. Execution of the pipeline can be optimized based on data dependencies and re-ordering of the pipeline operations.
Abstract:
According to one aspect, provided is a horizontally scaled database architecture. Partition a database enables efficient distribution of data across a number of systems reducing processing costs associated with multiple machines. According to some aspects, the partitioned database can be managed as a single source interface to handle client requests. Further, it is realized that by identifying and testing key properties, horizontal scaling architectures can be implemented and operated with minimal overhead. In one embodiment, databases can be partitioned in an order preserving manner such that the overhead associated with moving the data for a given partition can be minimized during management of the data and/or database. In one embodiment, splits and migrations operations prioritize zero cost partitions, thereby, reducing computational burden associated with managing a partitioned database.
Abstract:
Aspects of the present invention are directed to system and methods for optimizing identification of locations within a search area using hash values. A hash value represents location information in a single dimension format. Computing points around some location includes calculating an identification boundary that surrounds the location of interest based on the location's hash value. The identification boundary is expanded until it exceeds a search area defined by the location and a distance. Points around the location can be identified based on having associated hash values that fall within the identification boundary. Hashing operations let a system reduce the geometric work (i.e. searching inside boundaries) and processing required, by computing straightforward operations on hash quantities (e.g. searching a linear range of geohashes), instead of, for example, point to point comparisons.
Abstract:
Systems and methods are provided for managing asynchronous replication in a distributed database environment, while providing for scaling of the distributed database. A cluster of nodes can be assigned roles for managing partitions of data within the database and processing database requests. In one embodiment, each cluster includes a node with a primary role to process write operations and mange permit asynchronous replication of the operations to at least one secondary node. Each cluster or set of nodes can host one or more partitions of database data, and grouping of the clusters can be implemented a shard cluster that hosts the data of the distributed database. Each shard can be configured to manage the size of any hosted partitions, splitting database partitions, migrated partitions, and managing expansion of shard clusters to encompass new systems.
Abstract:
Database systems and methods that implement a data aggregation framework are provided. The framework can be configured to optimize aggregate operations over non-relational distributed databases, including, for example, data access, data retrieval, data writes, indexing, etc. Various embodiments are configured to aggregate multiple operations and/or commands, where the results (e.g., database documents and computations) captured from the distributed database are transformed as they pass through an aggregation operation. The aggregation operation can be defined as a pipeline which enables the results from a first operation to be redirected into the input of a subsequent operation, which output can be redirected into further subsequent operations. Computations may also be executed at each stage of the pipeline, where each result at each stage can be evaluated by the computation to return a result. Execution of the pipeline can be optimized based on data dependencies and re-ordering of the pipeline operations.