Invention Grant
- Patent Title: Dataset connector and crawler to identify data lineage and segment data
-
Application No.: US17505840Application Date: 2021-10-20
-
Publication No.: US11989597B2Publication Date: 2024-05-21
- Inventor: Austin Walters , Mark Watson , Galen Rafferty , Anh Truong , Jeremy Goodsitt , Vincent Pham
- Applicant: CAPITAL ONE SERVICES, LLC
- Applicant Address: US VA McLean
- Assignee: Capital One Services, LLC
- Current Assignee: Capital One Services, LLC
- Current Assignee Address: US VA McLean
- Agency: FINNEGAN, HENDERSON, FARABOW, GARRETT & DUNNER, LLP
- Main IPC: G06F16/00
- IPC: G06F16/00 ; G06F8/71 ; G06F9/54 ; G06F11/36 ; G06F16/22 ; G06F16/242 ; G06F16/2455 ; G06F16/248 ; G06F16/25 ; G06F16/28 ; G06F16/335 ; G06F16/903 ; G06F16/9032 ; G06F16/9038 ; G06F16/906 ; G06F16/93 ; G06F17/15 ; G06F17/16 ; G06F17/18 ; G06F18/20 ; G06F18/21 ; G06F18/2115 ; G06F18/213 ; G06F18/214 ; G06F18/22 ; G06F18/23 ; G06F18/24 ; G06F18/2411 ; G06F18/2415 ; G06F18/40 ; G06F21/55 ; G06F21/60 ; G06F21/62 ; G06F30/20 ; G06F40/117 ; G06F40/166 ; G06F40/20 ; G06N3/04 ; G06N3/044 ; G06N3/045 ; G06N3/06 ; G06N3/08 ; G06N3/088 ; G06N5/00 ; G06N5/02 ; G06N5/04 ; G06N7/00 ; G06N7/01 ; G06N20/00 ; G06Q10/04 ; G06T7/194 ; G06T7/246 ; G06T7/254 ; G06T11/00 ; G06V10/70 ; G06V10/98 ; G06V30/194 ; G06V30/196 ; H04L9/40 ; H04L67/00 ; H04L67/306 ; H04N21/234 ; H04N21/81

Abstract:
Systems and methods for connecting datasets are disclosed. For example, a system may include a memory unit storing instructions and a processor configured to execute the instructions to perform operations. The operations may include receiving a plurality of datasets and a request to identify a cluster of connected datasets among the received plurality of datasets. The operations may include selecting a dataset. In some embodiments, the operations include identifying a data schema of the selected dataset and determining a statistical metric of the selected dataset. The operations may include identifying foreign key scores. The operations may include generating a plurality of edges between the datasets based on the foreign key scores, the data schema, and the statistical metric. The operations may include segmenting and returning datasets based on the plurality of edges.
Public/Granted literature
- US20220083402A1 DATASET CONNECTOR AND CRAWLER TO IDENTIFY DATA LINEAGE AND SEGMENT DATA Public/Granted day:2022-03-17
Information query