Invention Grant
- Patent Title: System and method for joining skewed datasets in a distributed computing environment
-
Application No.: US16991939Application Date: 2020-08-12
-
Publication No.: US11615094B2Publication Date: 2023-03-28
- Inventor: Avnish Kumar Rastogi
- Applicant: HCL TECHNOLOGIES LIMITED
- Applicant Address: IN New Delhi
- Assignee: HCL TECHNOLOGIES LIMITED
- Current Assignee: HCL TECHNOLOGIES LIMITED
- Current Assignee Address: IN New Delhi
- Main IPC: G06F16/20
- IPC: G06F16/20 ; G06F16/2453 ; G06F16/2455 ; G06F16/27 ; G06F16/28 ; G06F16/21

Abstract:
Disclosed is a method and system for joining datasets in a distributed computing environment. The system comprises a memory 206 and a processor 202. The processor 202 identifies a skewed dataset from two or more datasets to be joined. The processor 202 identifies a replication parameter from a configuration file. The processor 202 then assigns a randomly assigned machine number to each chunk of the skewed dataset owned by the nodes/machines involved in the join operation. The processor 202 forms copies of the non-skewed dataset equal to the replication parameter and adds the copy number to each sample of the copy of the non-skewed dataset formed. Further, the processor 202 merges each non-skewed dataset into the final copy of the non-skewed dataset, forming a single non skewed dataset. The processor 202 then repeats these steps for all the non-skewed datasets involved in the join operation resulting in generation of merged copies of all the non-skewed datasets and then performs the joining operation.
Information query