Effectively fusing database tables

    公开(公告)号:US10853033B1

    公开(公告)日:2020-12-01

    申请号:US15729931

    申请日:2017-10-11

    Applicant: Amperity, Inc.

    Abstract: The present disclosure relates to fuse multiple database tables together. The fields of the database tables may be normalized using semantic fields. Under a first approach, database tables are deduplicated by consolidating redundant records. This may be done by performing pairwise comparisons to identify related pairs of records and then clustering the related pairs of records. Then, the deduplicated database tables are merged by performing another pairwise comparison. Under a second approach, the database tables may be concatenated. Thereafter, records are subject to pairwise comparisons and then clustered to create a merged database table.

    Clustering of data records with hierarchical cluster IDs

    公开(公告)号:US10922337B2

    公开(公告)日:2021-02-16

    申请号:US16399219

    申请日:2019-04-30

    Applicant: Amperity, Inc.

    Abstract: The present disclosure relates clustering similar data records together in a hierarchical clustering scheme. Each tier in a cluster corresponds to a minimal match score, which reflects a degree of confidence. A hierarchical cluster ID is generated for respective data records. The hierarchical cluster ID may be made up of a series of values, wherein each value reflects a tier within the hierarchical clustering scheme. A user may enter a partial hierarchical cluster ID to select clusters associated with a lower confidence. Thus, in some embodiments, the hierarchical cluster ID is variable in length in a manner that corresponds to the tiers in the hierarchical clustering scheme.

    CLUSTERING OF DATA RECORDS WITH HIERARCHICAL CLUSTER IDS

    公开(公告)号:US20200349174A1

    公开(公告)日:2020-11-05

    申请号:US16399219

    申请日:2019-04-30

    Applicant: Amperity, Inc.

    Abstract: The present disclosure relates clustering similar data records together in a hierarchical clustering scheme. Each tier in a cluster corresponds to a minimal match score, which reflects a degree of confidence. A hierarchical cluster ID is generated for respective data records. The hierarchical cluster ID may be made up of a series of values, wherein each value reflects a tier within the hierarchical clustering scheme. A user may enter a partial hierarchical cluster ID to select clusters associated with a lower confidence. Thus, in some embodiments, the hierarchical cluster ID is variable in length in a manner that corresponds to the tiers in the hierarchical clustering scheme.

Patent Agency Ranking