Invention Grant
- Patent Title: Method and system for parallelization of ingestion of large data sets
-
Application No.: US15909846Application Date: 2018-03-01
-
Publication No.: US10831773B2Publication Date: 2020-11-10
- Inventor: Badih Schoueri , Gregory Gorshtein , Vladimir Antonevich
- Applicant: Next Pathway Inc.
- Applicant Address: CA Toronto
- Assignee: NEXT PATHWAY INC.
- Current Assignee: NEXT PATHWAY INC.
- Current Assignee Address: CA Toronto
- Agency: Lewis Roca Rothgerber Christie LLP
- Main IPC: G06F3/06
- IPC: G06F3/06 ; G06F16/10 ; G06F16/25 ; G06F16/84

Abstract:
Embodiments of the present invention relate to systems and methods for ingesting input data containing a plurality of records into a data lake. In an embodiment, the method comprises splitting the input data into a plurality of input splits consisting of a balanced number of records; reading the records from the plurality of input splits in parallel, regardless of the format and encoding of the input source; converting the input data within the records into at least one key/value pair; transforming the values input data into a serializable format; sorting the key/value pairs of the transformed values such that the records are sorted in the same order as they were read; writing the transformed values to an output file; and storing the output file to the data lake.
Public/Granted literature
- US20180253478A1 METHOD AND SYSTEM FOR PARALLELIZATION OF INGESTION OF LARGE DATA SETS Public/Granted day:2018-09-06
Information query