Abstract:
A computer-implemented method and system at a network switch provides using one or more processors to perform a pre-defined database function on query data contained in data messages received at the network switch, with the performing producing result data, and wherein the pre-defined database function is performed on the query data in a first mode of operation to a state of full completion, generating complete result data and no skipped query data, and in a second mode of operation to a state of partial completion, generating partially complete result data and skipped query data. Further, the method and system performing one or more network switch functions to route the complete result data, and/or route the partially complete result data and skipped query data, to one or more destination nodes. In addition, an application programming interface (API) is used to define the database function.
Abstract:
A method for cloning data samples in a data set based on statistic information of the data samples. The method does not use any of the data samples to perform the cloning. The statistic information includes a first set of statistic parameters obtained from a data matrix formed by data entries of the data samples based on Eckart-Young theorem, and a second set of statistic parameters indicating statistical properties of the data entries of the data samples. The data samples are reconstructed using the first and the second sets of statistic parameters based on Eckart-Young theorem.
Abstract:
A computer-implemented method and system are provided, including executing an application programming interface (API) in a network switch to define at least one of one or more database functions, performing, using one or more processors, the one or more database functions on at least a portion of data contained in a data message received at the switch, to generate result data, and routing the result data to one or more destination nodes. A database function-defined network switch includes a network switch and one or more processors to perform a pre-defined database function on query data contained in data messages received at the switch, to produce result data, wherein the pre-defined database function is performed on the query data in a first mode of operation to a state of full completion, generating complete result data and no skipped query data, or to a state of partial completion, generating partially completed result data and skipped query data.
Abstract:
Data messages having different priorities may be stored in different communication buffers of a network node. The data messages may then be forwarded from the communication buffers to working buffers as space becomes available in the working buffers. After being forwarded to the working buffers, the data messages may be available to be processed by upper-layer operations of the network node. Priorities may be assigned to the data messages based on a priority level of a query associated with the data messages, a priority level of an upper-layer operation assigned to process the data messages, or combinations thereof.
Abstract:
System and method embodiments are provided for adaptive vector size selection for vectorized query execution. The adaptive vector size selection is implemented in two stages. In a query planning stage, a suitable vector size is estimated for a query by a query planner. The planning stage includes analyzing a query plan tree, segmenting the tree into different segments, and assigning to the query execution plan an initial vector size to each segment. In a subsequent query execution stage, an execution engine monitors hardware performance indicators, and adjusts the vector size according to the monitored hardware performance indicators. Adjusting the vector size includes trying different vector sizes and observing related processor counters to increase or decrease the vector size, wherein the vector size is increased to improve hardware performance according to the processor counters, and wherein the vector size is decreased when the processor counters indicate a decrease in hardware performance.
Abstract:
System and method embodiments are provided for improving the performance of query processing in a massively parallel processing (MPP) database system by pushing down join query processing to data nodes recursively. An embodiment method includes receiving, at a coordinator process, a join query associated with a plurality of tables of the MPP database system, generating, at the coordinator process, an execution plan tree for the join query, and processing, at each of a plurality of data nodes communicating with the coordinator process, the execution plan tree to obtain join query results. The method further includes, upon detecting a next join operator below a top join operator in the execution plan tree at each of the data nodes, forwarding to the other data nodes a sub-tree for the next join operator, and receiving, at each of the data nodes from the other data nodes, sub-tree processing results.
Abstract:
System and method embodiments are provided for using different storage formats for a primary database and its replicas in a database managed replication (DMR) system. As such, the advantages of both formats can be combined with suitable design complexity and implementation. In an embodiment, data is arranged in a sequence of rows and stored in a first storage format at the primary database. The data arranged in the sequence of rows is also stored in a second storage format at the replica database. The sequence of rows is determined according to the first storage format or the second storage format. The first storage format is a row store (RS) and the second storage format is a column store (CS), or vice versa. In an embodiment, the sequence of rows is determined to improve compression efficiency at the CS.
Abstract:
Embodiments are provided herein for efficient out-of-order (OOO) multiple query execution within a stored procedure in a database processing system. An embodiment method includes compiling a procedure comprising a plurality of statements. During the compiling, any dependencies between the statements are detected and maintained in a dependency table. The method further includes executing the procedure. During the execution, the method includes upon detecting a change in a dependency between the statements, updating the dependency in the dependency table. The statements are scheduled for OOO execution according to the dependency table with the updated dependencies.
Abstract:
A method for data redistribution of a job data in a first datanode (DN) to at least one additional DN in a Massively Parallel Processing (MPP) Database (DB) is provided. The method includes recording a snapshot of the job data, creating a first data portion in the first DN and a redistribution data portion in the first DN, collecting changes to a job data copy stored in a temporary table, and initiating transfer of the redistribution data portion to the at least one additional DN.
Abstract:
Data messages having different priorities may be stored in different communication buffers of a network node. The data messages may then be forwarded from the communication buffers to working buffers as space becomes available in the working buffers. After being forwarded to the working buffers, the data messages may be available to be processed by upper-layer operations of the network node. Priorities may be assigned to the data messages based on a priority level of a query associated with the data messages, a priority level of an upper-layer operation assigned to process the data messages, or combinations thereof.