- Patent Title: Tracking missing data using provenance traces and data simulation
-
Application No.: US16105454Application Date: 2018-08-20
-
Publication No.: US10740209B2Publication Date: 2020-08-11
- Inventor: Salil Joshi , Hima Prasad Karanam , Manish Kesarwani , Sameep Mehta
- Applicant: International Business Machines Corporation
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agency: Ryan, Mason & Lewis, LLP
- Main IPC: G06F9/44
- IPC: G06F9/44 ; G06F11/34 ; G06N7/00 ; G06N20/00

Abstract:
Methods, systems, and computer program products for tracking missing data using provenance traces and data simulation are provided herein. A computer-implemented method includes generating, for each of multiple stages in a data curation sequence, a machine learning model of the data curation sequence, wherein the model is based on historical input records within the data curation sequence, historical output records within the data curation sequence, and provenance data within the data curation sequence; creating a simulated output record based on a detected anomaly corresponding to the data curation sequence; predicting the content of absent input records that precede the simulated output record in the data curation sequence and provenance data corresponding to the simulated output record; and outputting, to a user, in response to a query pertaining to the detected anomaly, the predicted input records and information relating the predicted input records to the detected anomaly.
Public/Granted literature
- US20200057708A1 Tracking Missing Data Using Provenance Traces and Data Simulation Public/Granted day:2020-02-20
Information query