Invention Grant
- Patent Title: Method and system for identifying duplicate columns using statistical, semantics and machine learning techniques
-
Application No.: US17136124Application Date: 2020-12-29
-
Publication No.: US11561944B2Publication Date: 2023-01-24
- Inventor: Ganesh Prasath Ramani , Aasish Chandra , Jayanth Shenai , Raja Angamuthu , Pankaj Kumar Mishra
- Applicant: Tata Consultancy Services Limited
- Applicant Address: IN Mumbai
- Assignee: Tata Consultancy Services Limited
- Current Assignee: Tata Consultancy Services Limited
- Current Assignee Address: IN Mumbai
- Agency: Finnegan, Henderson, Farabow, Garrett & Dunner LLP
- Priority: IN202021013506 20200327
- Main IPC: G06F16/21
- IPC: G06F16/21 ; G06F16/215 ; G06F16/22 ; G06N20/00 ; G06F40/211 ; G06F40/30

Abstract:
With the availability of huge amount of data, it has becoming difficult to identify and manage duplicate data, especially when the data is in a plurality of columns. A method and system for identifying duplicate columns using statistical, semantics and machine learning techniques have been provided. The system provides a design framework to compare huge datasets at column level and identify potential duplicate columns, not based on the column title, but based on all of its values. The disclosure has ability to compare values in multiple columns and identify potential duplicate columns wherein comparison of values is not only for the exact match, but for semantic match, smart match, fuzzy match, and match after UOM conversion etc. using Statistical, semantics and machine learning techniques.
Public/Granted literature
Information query