Abstract:
Queries may be processed more efficiently in an massively parallel processing (MPP) database by locally optimizing the global execution plan. The global execution plan and a semantic tree may be provided to MPP data nodes by an MPP coordinator. The MPP data nodes may then use the global execution plan and the semantic tree to generate a local execution plan. Thereafter, the MPP data nodes may select either the global execution plan or the local execution plan is accordance with a cost evaluation.
Abstract:
System and method embodiments are provided for using different storage formats for a primary database and its replicas in a database managed replication (DMR) system. As such, the advantages of both formats can be combined with suitable design complexity and implementation. In an embodiment, data is arranged in a sequence of rows and stored in a first storage format at the primary database. The data arranged in the sequence of rows is also stored in a second storage format at the replica database. The sequence of rows is determined according to the first storage format or the second storage format. The first storage format is a row store (RS) and the second storage format is a column store (CS), or vice versa. In an embodiment, the sequence of rows is determined to improve compression efficiency at the CS.
Abstract:
An embodiment method for massively parallel processing includes assigning a primary key to a first table in a database and a foreign key to a second table in the database, the foreign key of the second table identical to the primary key of the first table, determining a number of partition groups desired for the database, partitioning the first table into first partitions based on the primary key assigned and the number of partition groups desired, partitioning the second table into second partitions based on the foreign key assigned and the number of partition groups desired, and distributing the first partitions and the second partitions to the partition groups as partitioned. An embodiment system for implementing the embodiment methods is also disclosed.
Abstract:
A method for data redistribution of a job data in a first datanode (DN) to at least one additional DN in a Massively Parallel Processing (MPP) Database (DB) is provided. The method includes recording a snapshot of the job data, creating a first data portion in the first DN and a redistribution data portion in the first DN, collecting changes to a job data copy stored in a temporary table, and initiating transfer of the redistribution data portion to the at least one additional DN.
Abstract:
A method for data redistribution of a job data in a first datanode (DN) to at least one additional DN in a Massively Parallel Processing (MPP) Database (DB) is provided. The method includes recording a snapshot of the job data, creating a first data portion in the first DN and a redistribution data portion in the first DN, collecting changes to a job data copy stored in a temporary table, and initiating transfer of the redistribution data portion to the at least one additional DN.
Abstract:
Dynamically re-allocating tasks and/or memory quotas amongst work agents in symmetric multiprocessing (SMP) systems can significantly mitigate delays and inefficiencies associated with data skew. For example, unfinished tasks can be reallocated from a busy work agent to an idle work agent upon determining that the idle work agent has finished processing its originally assigned set of tasks. Alternatively, a portion of a memory quota assigned to an idle work agent can be reallocated to a busy work agent for use in processing the remaining tasks. Memory quotas can be re-assigned by releasing the memory quota back into a memory pool once the idle work agent has finished processing its originally assigned tasks, and then reallocating some or all of the memory quota to the busy work agent.
Abstract:
An embodiment method for massively parallel processing includes initiating a management instance on an initial machine, the management instance generating an initial partition corresponding to the initial machine, determining a total number of partitions desired for processing a database, the total number of partitions including the initial partition, determining a number of additional machines available to process the database, grouping the initial machine and the additional machines together in a pod, and launching the management instance on the additional machines in the pod to generate the total number of partitions desired for the database. Additional embodiment methods and an embodiment system operable to perform such methods are also disclosed.