Invention Grant
- Patent Title: Automated load-balancing of partitions in arbitrarily imbalanced distributed mapreduce computations
-
Application No.: US14320373Application Date: 2014-06-30
-
Publication No.: US09613127B1Publication Date: 2017-04-04
- Inventor: Silvius V. Rus , Wei Jiang
- Applicant: Quantcast Corporation
- Applicant Address: US CA San Francisco
- Assignee: Quantcast Corporation
- Current Assignee: Quantcast Corporation
- Current Assignee Address: US CA San Francisco
- Agent Robin W. Reasoner; Renee D. Jacowitz
- Main IPC: G06F17/30
- IPC: G06F17/30

Abstract:
A distributed computing system executes a MapReduce job on streamed data that includes an arbitrary amount of imbalance with respect to the frequency distribution of the data keys in the dataset. A map task module maps the dataset to a coarse partitioning, and generates a list of the top K keys with the highest frequency among the dataset. A sort task module employs a plurality of sorters to read the coarse partitioning and sort the data into buckets by data key. The values for the top K most frequent keys are separated into single-key buckets. The other less frequently occurring keys are assigned to buckets that each have multiple keys assigned to it. Then, more than one worker is assigned to each single-key bucket. The output of the multiple workers assigned to each respective single-key bucket is stitched together.
Information query