Optimizing insight generation in heterogeneous datasets
Abstract:
Embodiments relate to a system, computer program product, and method to merge two or more heterogeneous datasets. Seed attributes of each dataset that is the subject of the merge are identified. The seed attributes are derived from candidate attributes of the respective datasets. A correlation is assessed to create a set of mergeable attributes and a set of non-mergeable attributes. A cohesiveness characteristic is leveraged to iteratively identify one or more attributes from the set of non-mergeable attributes, and to amend the set of mergeable attributes with the one or more attributes identified in the set of non-mergeable attributes. A merged dataset based on the amended set of mergeable attributes and representing non-trivial similarities between the first and second dataset is formed as output.
Public/Granted literature
Information query
Patent Agency Ranking
0/0