Systems and/or methods for machine-learning based data correction and completion in sparse datasets
Abstract:
Certain example embodiments herein relate to techniques for automatically correcting and completing data in sparse datasets. Records in the dataset are divided into groups with properties having similar values. For each group, one or more properties of the records therein that is/are to be ignored is/are identified, based on record distances relative to the records in the group, and distances among values for each of the properties of the records in the respective group. The records in the groups are further divided into sub-groups without regard to the one or more properties that is/are to be ignored. The sub-groups include a smaller and more cohesive set of records. For each sub-group: based on the records therein, predicted values to be applied to values identified as being empty but needing to be filled in are determined; and those predicted values are applied. The corrected/completed dataset is provided as output.
Information query
Patent Agency Ranking
0/0