Invention Grant
- Patent Title: Quality-performance optimized identification of duplicate data
-
Application No.: US17357176Application Date: 2021-06-24
-
Publication No.: US11573721B2Publication Date: 2023-02-07
- Inventor: Soma Shekar Naganna , Abhishek Seth , Neeraj Ramkrishna Singh
- Applicant: International Business Machines Corporation
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agency: Keohane & D'Alessandro, PLLC
- Agent Rakesh Roy; Hunter E. Webb
- Main IPC: G06F3/06
- IPC: G06F3/06 ; G06F9/50

Abstract:
An approach is provided for providing optimized identification of duplicate data in a networked computing environment. An aggregate feature vector is created that is specific to an attribute of the data (e.g., a field that holds specific informational content). The aggregate feature vector has a set of dimensions that each define a specific comparison function used to test for similarity between data entries in the attribute. Each dimension in the aggregate feature vector is assigned an effectiveness, and a cost is computed for each dimension. Based on these two, a subset of dimensions is selected to form an optimized feature vector. This optimized feature vector can then be used to analyze a dataset to find matching data.
Public/Granted literature
- US20220413727A1 QUALITY-PERFORMANCE OPTIMIZED IDENTIFICATION OF DUPLICATE DATA Public/Granted day:2022-12-29
Information query