Abstract:
Database systems and methods that implement a data aggregation framework are provided. The framework can be configured to optimize aggregate operations over non-relational distributed databases, including, for example, data access, data retrieval, data writes, indexing, etc. Various embodiments are configured to aggregate multiple operations and/or commands, where the results (e.g., database documents and computations) captured from the distributed database are transformed as they pass through an aggregation operation. The aggregation operation can be defined as a pipeline which enables the results from a first operation to be redirected into the input of a subsequent operation, which output can be redirected into further subsequent operations. Computations may also be executed at each stage of the pipeline, where each result at each stage can be evaluated by the computation to return a result. Execution of the pipeline can be optimized based on data dependencies and re-ordering of the pipeline operations.
Abstract:
A durable memory-mapped database system includes a first memory-mapped view of a database, a second memory-mapped view of the database, a journal buffer and a journal. The first memory-mapped view of the database is a protected view and includes copies of a plurality of datafiles from the database. The second memory-mapped view of the database is a write view and includes copies of the plurality of datafiles. The journal buffer is a buffer in random access memory configured to record datafile updates. The journal is configured to periodically receive recorded datafile updates from the journal buffer.
Abstract:
Aspects of the present invention are directed to system and methods for optimizing identification of locations within a search area using hash values. A hash value represents location information in a single dimension format. Computing points around some location includes calculating an identification boundary that surrounds the location of interest based on the location's hash value. The identification boundary is expanded until it exceeds a search area defined by the location and a distance. Points around the location can be identified based on having associated hash values that fall within the identification boundary. Hashing operations let a system reduce the geometric work (i.e. searching inside boundaries) and processing required, by computing straightforward operations on hash quantities (e.g. searching a linear range of geohashes), instead of, for example, point to point comparisons.
Abstract:
Aspects of the present invention are directed to system and methods for optimizing identification of locations within a search area using hash values. A hash value represents location information in a single dimension format. Computing points around some location includes calculating an identification boundary that surrounds the location of interest based on the location's hash value. The identification boundary is expanded until it exceeds a search area defined by the location and a distance. Points around the location can be identified based on having associated hash values that fall within the identification boundary. Hashing operations let a system reduce the geometric work (i.e. searching inside boundaries) and processing required, by computing straightforward operations on hash quantities (e.g. searching a linear range of geohashes), instead of, for example, point to point comparisons.
Abstract:
According to one aspect, a distributed database system is configured to manage write operations received from database clients and execute the write operations at primary nodes. The system then replicates received operations across a plurality of secondary nodes. Write operation can include safe write requests such that the database guaranties the operation against data loss once acknowledged. In some embodiments, the system incorporates an enhanced arbiter role the enables the arbiter to participate in cluster-wide commitment of data. In other embodiments, the enhanced arbiter role enables secondary nodes to evaluate arbiter operations logs when determining election criteria for new primary nodes.
Abstract:
According to one aspect, a distributed database system is configured to manage write operations received from database clients and execute the write operations at primary nodes. The system then replicates received operations across a plurality of secondary nodes. Write operation can include safe write requests such that the database guaranties the operation against data loss once acknowledged. In some embodiments, the system incorporates an enhanced arbiter role the enables the arbiter to participate in cluster-wide commitment of data. In other embodiments, the enhanced arbiter role enables secondary nodes to evaluate arbiter operations logs when determining election criteria for new primary nodes.
Abstract:
According to one aspect, a distributed database system is configured to manage multi-writer operations on a distributed database by implementing one or more catamorphic database operators. Catamorphic operators can be architected on the system, and executed with little or no reconciliation logic. Catamorphic operators define sets of catamorphic operations and respective execution logic where the order of execution of catamorphic operations is not relevant to a final result.
Abstract:
Provided are systems and methods for managing asynchronous replication in a distributed database environment, wherein a cluster of nodes are assigned roles for processing database requests. In one embodiment, the system provides a node with a primary role to process write operations against its database, generate an operation log reflecting the processed operations, and permit asynchronous replication of the operations to at least one secondary node. In another embodiment, the primary node is the only node configured to accept write operations. Both primary and secondary nodes can process read operations. Although in some settings read requests can be restricted to secondary nodes or the primary node. In one embodiment, the systems and methods provide for automatic failover of the primary node role, can include a consensus election protocol for identifying the next primary node. Further, the systems and methods can be configured to automatically reintegrate a failed primary node.
Abstract:
Database systems and methods that implement a data aggregation framework are provided. The framework can be configured to optimize aggregate operations over non-relational distributed databases, including, for example, data access, data retrieval, data writes, indexing, etc. Various embodiments are configured to aggregate multiple operations and/or commands, where the results (e.g., database documents and computations) captured from the distributed database are transformed as they pass through an aggregation operation. The aggregation operation can be defined as a pipeline which enables the results from a first operation to be redirected into the input of a subsequent operation, which output can be redirected into further subsequent operations. Computations may also be executed at each stage of the pipeline, where each result at each stage can be evaluated by the computation to return a result. Execution of the pipeline can be optimized based on data dependencies and re-ordering of the pipeline operations.
Abstract:
Database systems and methods that implement a data aggregation framework are provided. The framework can be configured to optimize aggregate operations over non-relational distributed databases, including, for example, data access, data retrieval, data writes, indexing, etc. Various embodiments are configured to aggregate multiple operations and/or commands, where the results (e.g., database documents and computations) captured from the distributed database are transformed as they pass through an aggregation operation. The aggregation operation can be defined as a pipeline which enables the results from a first operation to be redirected into the input of a subsequent operation, which output can be redirected into further subsequent operations. Computations may also be executed at each stage of the pipeline, where each result at each stage can be evaluated by the computation to return a result. Execution of the pipeline can be optimized based on data dependencies and re-ordering of the pipeline operations.