Invention Grant
- Patent Title: Method and system for deduplicating data
-
Application No.: US15824012Application Date: 2017-11-28
-
Publication No.: US10528534B2Publication Date: 2020-01-07
- Inventor: Namit Kabra , Yannick Saillet
- Applicant: International Business Machines Corporation
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agent Stephen R. Tkacs; Stephen J. Walder, Jr.; Robert C. Bunker
- Main IPC: G06F16/00
- IPC: G06F16/00 ; G06F16/215 ; G06F16/23

Abstract:
A mechanism is provided for deduplicating a set of records of data. The mechanism identifies a subset of records each having one or more invalid attribute values. For each invalid attribute value of a given attribute the mechanism determines one or more associated valid candidates of attribute values of the given attribute using the set of records. For each record of the subset of records the mechanism replaces the one or more invalid attribute values by one or more combinations of the determined valid candidates of attribute values, resulting in a modified set of records. The mechanism selects a subset of records of the modified set of records that satisfy a consistency condition on the attribute values of each record. The mechanism deduplicates the selected subset of records of the modified set of records responsive to determining the subset of records comprises more than one record.
Public/Granted literature
- US20180089235A1 Method and System for Deduplicating Data Public/Granted day:2018-03-29
Information query