System and method for handling data skew at run time
Abstract:
A system for handling data skew includes a cluster of computing nodes and a processor. The cluster includes one or more first nodes, each of which has a storage, and one or more second nodes, each of which has a storage. The storage of the respective second nodes has a higher access speed than the storage of the respective first nodes. The processor is configured to split input data into partitions of the input data, to detect if any of the partitions has data skew, and to assign ones of the partitions, which are detected as having no data skew, to the first nodes, and ones of the partitions, which are detected as having the data skew, to the second nodes, for parallel processing.
Public/Granted literature
Information query
Patent Agency Ranking
0/0