Invention Grant
- Patent Title: Data de-duplication
-
Application No.: US14716910Application Date: 2015-05-20
-
Publication No.: US10467203B2Publication Date: 2019-11-05
- Inventor: Namit Kabra , Yannick Saillet
- Applicant: International Business Machines Corporation
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agent Steven F. McDaniel; David S. Richart; Arnold B. Bangali
- Main IPC: G06F16/215
- IPC: G06F16/215 ; G06F16/23

Abstract:
A method, executed by a computer, for de-duplicating data includes receiving a dataset, pivoting the dataset along a set of columns that have a common domain to provide a pivoted dataset, de-duplicating the pivoted dataset to provide a de-duplicated dataset, and using the de-duplicated dataset. De-duplicating the pivoted dataset may include computing similarity scores for records that have different primary keys and merging records that have a similarity score that exceeds a selected threshold value. The method may include determining the set of columns having a common domain by referencing a business catalog and/or conducting a data classification operation on some or all of the columns of the dataset. The method may also include pivoting the dataset along another set of columns that have a different common domain. A computer system and computer program product corresponding to the method are also disclosed herein.
Public/Granted literature
- US20160092479A1 DATA DE-DUPLICATION Public/Granted day:2016-03-31
Information query