Invention Grant
- Patent Title: Method and system for linking heterogeneous data sources
-
Application No.: US14577220Application Date: 2014-12-19
-
Publication No.: US10235633B2Publication Date: 2019-03-19
- Inventor: Vladimir Tereshkov , Syed Haider , Valerio Aimale , Joshua Hartman , Christopher Bound , Ron Katriel
- Applicant: Medidata Solutions, Inc.
- Applicant Address: US NY New York
- Assignee: Medidata Solutions, Inc.
- Current Assignee: Medidata Solutions, Inc.
- Current Assignee Address: US NY New York
- Agency: Steptoe & Johnson LLP
- Agent Robert Greenfeld
- Main IPC: G06N20/20
- IPC: G06N20/20 ; G06N20/00 ; G06F16/25 ; G06F16/28 ; G06F16/22 ; G06F16/33 ; G06F16/215

Abstract:
A method for linking records (related to an entity) from separate databases may include extracting a first record from a first database as a first vector, extracting a second record from a second database as a second vector, generating first and second sub-vectors for the first and second vectors, where each sub-vector includes quality features from the respective vector, pre-processing the first and second sub-vectors using domain knowledge, calculating a distance assessment classifier based on the first and second sub-vectors, and determining whether the distance represented by the distance assessment classifier is greater than a threshold. If the distance is greater than the threshold, the records may be linked; if not, the method extracts additional records and repeats after generating first and second sub-vectors until the distance is greater than the threshold. A system for linking records is also disclosed.
Public/Granted literature
- US20160180245A1 METHOD AND SYSTEM FOR LINKING HETEROGENEOUS DATA SOURCES Public/Granted day:2016-06-23
Information query