- Patent Title: Deriving a multi-pass matching algorithm for data de-duplication
-
Application No.: US14494875Application Date: 2014-09-24
-
Publication No.: US10169418B2Publication Date: 2019-01-01
- Inventor: Hima P. Karanam , Albert Maier , Marvin Mendelssohn , Heather Stimpson , Dan Dan Zheng
- Applicant: International Business Machines Corporation
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agency: Ryan, Mason & Lewis, LLP
- Main IPC: G06F17/30
- IPC: G06F17/30

Abstract:
Methods, systems, and computer program products for deriving a multi-pass matching algorithm for data de-duplication are provided herein. A method includes identifying multiple passes across multiple databases using a set of one or more blocking columns derived from a set of trained input data; identifying, in each of the multiple passes, one or more columns across the multiple databases that match one or more of the blocking columns; selecting a given pass from the multiple passes, wherein said given pass comprises a maximum number of matching columns within the multiple passes; determining, for the given pass, data that conform to the given pass comprising (i) a set of matching columns, (ii) one or more matching types and (iii) one or more weights; and determining one or more subsequent passes across the multiple databases iteratively by removing the data that conform to the given pass.
Public/Granted literature
- US20160085807A1 Deriving a Multi-Pass Matching Algorithm for Data De-Duplication Public/Granted day:2016-03-24
Information query