Invention Grant
US08195644B2 System, method, and computer-readable medium for optimization of multiple parallel join operations on skewed data 有权
系统,方法和计算机可读介质,用于在偏斜数据上优化多个并行连接操作

  • Patent Title: System, method, and computer-readable medium for optimization of multiple parallel join operations on skewed data
  • Patent Title (中): 系统,方法和计算机可读介质,用于在偏斜数据上优化多个并行连接操作
  • Application No.: US12245789
    Application Date: 2008-10-06
  • Publication No.: US08195644B2
    Publication Date: 2012-06-05
  • Inventor: Yu Xu
  • Applicant: Yu Xu
  • Applicant Address: US OH Dayton
  • Assignee: Teradata US, Inc.
  • Current Assignee: Teradata US, Inc.
  • Current Assignee Address: US OH Dayton
  • Agent Ramin Mahboubian
  • Main IPC: G06F17/30
  • IPC: G06F17/30
System, method, and computer-readable medium for optimization of multiple parallel join operations on skewed data
Abstract:
A system, method, and computer-readable medium that facilitate management of data skew during a parallel multiple join operation are provided. Portions of tables involved in the join operation are distributed among a plurality of processing modules, and each of the processing modules is provided with a list of skewed values of a join column of a larger table involved in the join operation. Each of the processing modules scans the rows of first and second tables distributed to the processing modules and compares values of the join columns of both tables with the list of skewed values. Rows of a larger table having non-skewed values in the join column are redistributed, and rows of the larger table having skewed values in the join column are maintained locally at the processing modules. Rows of the smaller table that have non-skewed values in the join column are redistributed, and rows of the smaller table that have skewed values in the join column are duplicated among the processing modules. Rows of a third table involved in the join operation are redistributed based on the join attribute value of the rows. A local join data set is then generated by each of the processing modules, and the parallel join is completed by merging each of the processing module's local join data set.
Information query
Patent Agency Ranking
0/0