Main Article Content
Although many skew-handling algorithms have been proposed for simple join operations, they remain generally inefficient in the case of θ-join and in the case of multi-join. A new method for self-balancing equi-join operations on shared-nothing (SN) machines is proposed here. It offers deterministic and near-perfect load balancing through flexible control of communications in intra-transaction parallelism. The new algorithm mixes a balanced data-distribution strategy with pure hash-join. Its predictably low join-product- and attribute-value skews make it suitable for repeated use in multi-join operations. Its tradeoff between balancing overhead and speedup is analyzed in the BSP (Bulk-synchronous parallel) computing model. The scalable model predicts a negligible join product skew and a near-linear speed-up in any combination of selectivity, skew and number of processors. This prediction is confirmed by a series of tests.