Invention Grant
- Patent Title: System and method for combining data sets
-
Application No.: US14627198Application Date: 2015-02-20
-
Publication No.: US09881031B1Publication Date: 2018-01-30
- Inventor: Jing Lin , David Fogarty , Chit Ming Yip , Wanyu Liao
- Applicant: CIGNA Intellectual Property, Inc.
- Applicant Address: US DE Wilmington
- Assignee: Cigna Intellectual Property, Inc.
- Current Assignee: Cigna Intellectual Property, Inc.
- Current Assignee Address: US DE Wilmington
- Agency: Morgan, Lewis & Bockius, LLP
- Main IPC: G06F15/18
- IPC: G06F15/18 ; G06F17/30 ; G06N99/00 ; G06N5/04

Abstract:
Embodiments of the invention involve receiving a first set of data describing one or more first observations and a second set of data describing one or more second observations. The first set of data comprises at least two types of data and the second set of data comprises at least two types of data. At least one of the two types of data in the first data set are common with at least one of the two types of data in the second data set. The common types of data comprise common data to the first and second sets of data. The types of data that are not common comprise exclusive data for each of the first and second sets of data. A first multiple regression model is developed for the first data set. The common data for the first data set are set as independent variables and the exclusive data for the first data set are set as dependent variables. A second multiple regression model is developed for the second data set. The common data for the second data set are set as independent variables and the exclusive data for the second data set are set as dependent variables. Prediction results of the first and second multiple regression models are received. Based on the prediction results, at least some of the one or more first observations and the one or more second observations are classified as reasonable observations, which are well-predicted observations. At least some of the one or more first observations and the one or more second observations are classified as outlier observations, which are not classified as well-predicted observations. The outlier observations are removed. The reasonable observations are assigned into intervals for each of the types of data. Based on the assignment, the observations are merged to create a third data set.
Information query