Invention Grant
- Patent Title: Stratified sampling using adaptive parallel data processing
-
Application No.: US15208677Application Date: 2016-07-13
-
Publication No.: US09697277B2Publication Date: 2017-07-04
- Inventor: Andrey Balmin , Vuk Ercegovac , Peter J. Haas , Liping Peng , John Sismanis
- Applicant: International Business Machines Corporation
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agent Robert J. Shatto
- Main IPC: G06F17/30
- IPC: G06F17/30

Abstract:
A computer-implemented method includes partitioning a plurality of records into a plurality of splits. Each split includes at least a portion of the plurality of records. The method further includes providing at least one split of the plurality of splits to a mapper. The mapper scans the input data set, transforms each input record using a map function, and extracts a grouping key in parallel. The method further includes assigning at least a portion the records of the at least one split to a group. Each assignment to the group is based on a strata of the assigned record, and filtering the records of the group. Each filtering is based on a comparison of a weight of a record to a local threshold of the mapper. The method further includes shuffling the group to a reducer and providing a stratified sampling of the plurality of records based on the group.
Public/Granted literature
- US20160321350A1 STRATIFIED SAMPLING USING ADAPTIVE PARALLEL DATA PROCESSING Public/Granted day:2016-11-03
Information query