Abstract:
A method for data redistribution of a job data in a first datanode (DN) to at least one additional DN in a Massively Parallel Processing (MPP) Database (DB) is provided. The method includes recording a snapshot of the job data, creating a first data portion in the first DN and a redistribution data portion in the first DN, collecting changes to a job data copy stored in a temporary table, and initiating transfer of the redistribution data portion to the at least one additional DN.
Abstract:
System and method embodiments are provided for improving the performance of query processing in a massively parallel processing (MPP) database system by pushing down join query processing to data nodes recursively. An embodiment method includes receiving, at a coordinator process, a join query associated with a plurality of tables of the MPP database system, generating, at the coordinator process, an execution plan tree for the join query, and processing, at each of a plurality of data nodes communicating with the coordinator process, the execution plan tree to obtain join query results. The method further includes, upon detecting a next join operator below a top join operator in the execution plan tree at each of the data nodes, forwarding to the other data nodes a sub-tree for the next join operator, and receiving, at each of the data nodes from the other data nodes, sub-tree processing results.
Abstract:
An embodiment method for massively parallel processing includes initiating a management instance on an initial machine, the management instance generating an initial partition corresponding to the initial machine, determining a total number of partitions desired for processing a database, the total number of partitions including the initial partition, determining a number of additional machines available to process the database, grouping the initial machine and the additional machines together in a pod, and launching the management instance on the additional machines in the pod to generate the total number of partitions desired for the database. Additional embodiment methods and an embodiment system operable to perform such methods are also disclosed.
Abstract:
An embodiment method for massively parallel processing includes assigning a primary key to a first table in a database and a foreign key to a second table in the database, the foreign key of the second table identical to the primary key of the first table, determining a number of partition groups desired for the database, partitioning the first table into first partitions based on the primary key assigned and the number of partition groups desired, partitioning the second table into second partitions based on the foreign key assigned and the number of partition groups desired, and distributing the first partitions and the second partitions to the partition groups as partitioned. An embodiment system for implementing the embodiment methods is also disclosed.
Abstract:
A database system comprises a persistent storage device, a log node including a memory and a processor, and a plurality of database nodes. A database node includes a cache memory configured to store a database instance, and a processor configured to initiate a database transaction by sending a snapshot request to the log node, the snapshot request including a list of pages that were either replaced or newly loaded in the cache memory. The log node processor is configured to send a snapshot response to the database node, wherein the snapshot response includes a snapshot of the database and a list of changed pages of the database instances. The database node processor is configured to update the status of the pages in cached memory according to the snapshot response and perform the database transaction.
Abstract:
System and method embodiments are provided for improving the performance of query processing in a massively parallel processing (MPP) database system by pushing down join query processing to data nodes recursively. An embodiment method includes receiving, at a coordinator process, a join query associated with a plurality of tables of the MPP database system, generating, at the coordinator process, an execution plan tree for the join query, and processing, at each of a plurality of data nodes communicating with the coordinator process, the execution plan tree to obtain join query results. The method further includes, upon detecting a next join operator below a top join operator in the execution plan tree at each of the data nodes, forwarding to the other data nodes a sub-tree for the next join operator, and receiving, at each of the data nodes from the other data nodes, sub-tree processing results.
Abstract:
A massively parallel processing (MPP) database can be re-partitioned/re-balanced while remaining on-line through a staged migration procedure. Staged migration may include a first stage and a second stage. During the first stage, entries in an existing partition are reallocated to the new partition, and the catalog is updated to associate the re-allocated entries with both the existing partition and the new partition such that queries for the re-allocated entries are directed toward the existing partition and the new partition. During the second stage, the re-allocated entries are migrated from the existing partition to the new partition, and after the migration is complete, the catalog is re-updated to associate the migrated entries with the new partition such that new queries are directed toward the new partition.
Abstract:
A massively parallel processing (MPP) database can be re-partitioned/re-balanced while remaining on-line through a staged migration procedure. Staged migration may include a first stage and a second stage. During the first stage, entries in an existing partition are reallocated to the new partition, and the catalog is updated to associate the re-allocated entries with both the existing partition and the new partition such that queries for the re-allocated entries are directed toward the existing partition and the new partition. During the second stage, the re-allocated entries are migrated from the existing partition to the new partition, and after the migration is complete, the catalog is re-updated to associate the migrated entries with the new partition such that new queries are directed toward the new partition.
Abstract:
A method for data redistribution of a job data in a first datanode (DN) to at least one additional DN in a Massively Parallel Processing (MPP) Database (DB) is provided. The method includes recording a snapshot of the job data, creating a first data portion in the first DN and a redistribution data portion in the first DN, collecting changes to a job data copy stored in a temporary table, and initiating transfer of the redistribution data portion to the at least one additional DN.
Abstract:
System and method embodiments are provided for using different storage formats for a primary database and its replicas in a database managed replication (DMR) system. As such, the advantages of both formats can be combined with suitable design complexity and implementation. In an embodiment, data is arranged in a sequence of rows and stored in a first storage format at the primary database. The data arranged in the sequence of rows is also stored in a second storage format at the replica database. The sequence of rows is determined according to the first storage format or the second storage format. The first storage format is a row store (RS) and the second storage format is a column store (CS), or vice versa. In an embodiment, the sequence of rows is determined to improve compression efficiency at the CS.