Parallelization of node's fault tolerent record linkage using smart indexing and hierarchical clustering
Abstract:
Embodiments include a computer-implemented method including identifying, by a primary computer device, a plurality of records, each record having one or more attributes; standardizing, by the primary computer device, each of the plurality of records; assigning, by the primary computer device, an index to one or more of the one or more attributes; providing, by the primary computer device, instructions for clustering the standardized plurality of records in parallel into one or more clusters, each cluster including records having the same index, the one or more clusters being in a group; receiving, by the primary computer device, one or more groups, each group including one or more clusters sharing a same index; and linking one or more of the plurality of records in a cluster with another one or more of the plurality of records in another cluster within a same group.
Information query
Patent Agency Ranking
0/0