-
公开(公告)号:US12242514B2
公开(公告)日:2025-03-04
申请号:US17316293
申请日:2021-05-10
Applicant: AMPERITY, INC.
Inventor: Yan Yan , Stephen Keith Meyles , Graeme Andrew Kyle Roche , Jeffrey Allen Stokes , Carlos Minoru Sakoda , Dan Suciu
Abstract: The present disclosure relates clustering similar data records together in a hierarchical clustering scheme. Each tier in a cluster corresponds to a minimal match score, which reflects a degree of confidence. In this respect, a higher confidence may lead to smaller sized clusters while a lower confidence may lead to larger sized clusters. Ordinal classification may be used to generate hierarchical clusters. In some embodiments, hierarchical clustering with conflict resolution is used to resolve user-defined hard conflicts in each tier of the clustering results.
-
公开(公告)号:US11797487B2
公开(公告)日:2023-10-24
申请号:US17715204
申请日:2022-04-07
Applicant: AMPERITY, INC.
Inventor: Stephen Meyles , Yan Yan , Dan Suciu , Michael P. Fikes
IPC: G06F7/02 , G06F16/00 , G06F16/174 , G06F16/28 , G06F16/22 , G06F40/197 , G06F17/16
CPC classification number: G06F16/1748 , G06F16/2272 , G06F16/285 , G06F40/197 , G06F16/288 , G06F17/16
Abstract: The present disclosure relates to optimizing one or more database tables that may include one or more redundant records. Records are clustered and assigned stable identifiers. In this manner, the underlying records within a cluster are not removed or deleted. As updates to the database are made, new clustering analyses are performed using the underlying records and any updates made. Newly identified clusters are reassigned stable identifiers.
-
公开(公告)号:US11003643B2
公开(公告)日:2021-05-11
申请号:US16399162
申请日:2019-04-30
Applicant: Amperity, Inc.
Inventor: Yan Yan , Stephen Keith Meyles , Graeme Andrew Kyle Roche , Jeffrey Allen Stokes , Carlos Minoru Sakoda , Dan Suciu
Abstract: The present disclosure relates clustering similar data records together in a hierarchical clustering scheme. Each tier in a cluster corresponds to a minimal match score, which reflects a degree of confidence. In this respect, a higher confidence may lead to smaller sized clusters while a lower confidence may lead to larger sized clusters. Ordinal classification may be used to generate hierarchical clusters. In some embodiments, hierarchical clustering with conflict resolution is used to resolve user-defined hard conflicts in each tier of the clustering results.
-
公开(公告)号:US11301426B1
公开(公告)日:2022-04-12
申请号:US16675789
申请日:2019-11-06
Applicant: Amperity, Inc.
Inventor: Stephen Meyles , Yan Yan , Dan Suciu , Michael P. Fikes
IPC: G06F7/02 , G06F16/00 , G06F16/174 , G06F16/22 , G06F16/28 , G06F40/197 , G06F17/16
Abstract: The present disclosure relates to optimizing one or more database tables that may include one or more redundant records. Records are clustered and assigned stable identifiers. In this manner, the underlying records within a cluster are not removed or deleted. As updates to the database are made, new clustering analyses are performed using the underlying records and any updates made. Newly identified clusters are reassigned stable identifiers.
-
公开(公告)号:US11669301B2
公开(公告)日:2023-06-06
申请号:US17104868
申请日:2020-11-25
Applicant: AMPERITY, INC.
Inventor: Stephen Meyles , Yan Yan , Carlos Sakoda , Ian Wesley-Smith , Dan Suciu
IPC: G06F7/02 , G06F16/00 , G06F7/14 , G06F16/2455 , G06F16/215 , G06F16/23 , G06F16/242
CPC classification number: G06F7/14 , G06F16/215 , G06F16/2365 , G06F16/244 , G06F16/24556
Abstract: The present disclosure relates to fuse multiple database tables together. The fields of the database tables may be normalized using semantic fields. Under a first approach, database tables are deduplicated by consolidating redundant records. This may be done by performing pairwise comparisons to identify related pairs of records and then clustering the related pairs of records. Then, the deduplicated database tables are merged by performing another pairwise comparison. Under a second approach, the database tables may be concatenated. Thereafter, records are subject to pairwise comparisons and then clustered to create a merged database table.
-
公开(公告)号:US10853033B1
公开(公告)日:2020-12-01
申请号:US15729931
申请日:2017-10-11
Applicant: Amperity, Inc.
Inventor: Stephen Meyles , Yan Yan , Carlos Sakoda , Ian Wesley-Smith , Dan Suciu
IPC: G06F7/02 , G06F16/00 , G06F7/14 , G06F16/2455 , G06F16/215 , G06F16/23 , G06F16/242
Abstract: The present disclosure relates to fuse multiple database tables together. The fields of the database tables may be normalized using semantic fields. Under a first approach, database tables are deduplicated by consolidating redundant records. This may be done by performing pairwise comparisons to identify related pairs of records and then clustering the related pairs of records. Then, the deduplicated database tables are merged by performing another pairwise comparison. Under a second approach, the database tables may be concatenated. Thereafter, records are subject to pairwise comparisons and then clustered to create a merged database table.
-
公开(公告)号:US10922337B2
公开(公告)日:2021-02-16
申请号:US16399219
申请日:2019-04-30
Applicant: Amperity, Inc.
Inventor: Yan Yan , Stephen Keith Meyles , Graeme Andrew Kyle Roche , Jeffrey Allen Stokes , Carlos Minoru Sakoda , Dan Suciu
Abstract: The present disclosure relates clustering similar data records together in a hierarchical clustering scheme. Each tier in a cluster corresponds to a minimal match score, which reflects a degree of confidence. A hierarchical cluster ID is generated for respective data records. The hierarchical cluster ID may be made up of a series of values, wherein each value reflects a tier within the hierarchical clustering scheme. A user may enter a partial hierarchical cluster ID to select clusters associated with a lower confidence. Thus, in some embodiments, the hierarchical cluster ID is variable in length in a manner that corresponds to the tiers in the hierarchical clustering scheme.
-
公开(公告)号:US20200349174A1
公开(公告)日:2020-11-05
申请号:US16399219
申请日:2019-04-30
Applicant: Amperity, Inc.
Inventor: Yan Yan , Stephen Keith Meyles , Graeme Andrew Kyle Roche , Jeffrey Allen Stokes , Carlos Minoru Sakoda , Dan Suciu
Abstract: The present disclosure relates clustering similar data records together in a hierarchical clustering scheme. Each tier in a cluster corresponds to a minimal match score, which reflects a degree of confidence. A hierarchical cluster ID is generated for respective data records. The hierarchical cluster ID may be made up of a series of values, wherein each value reflects a tier within the hierarchical clustering scheme. A user may enter a partial hierarchical cluster ID to select clusters associated with a lower confidence. Thus, in some embodiments, the hierarchical cluster ID is variable in length in a manner that corresponds to the tiers in the hierarchical clustering scheme.
-
公开(公告)号:US20200349136A1
公开(公告)日:2020-11-05
申请号:US16399162
申请日:2019-04-30
Applicant: Amperity, Inc.
Inventor: Yan Yan , Stephen Keith Meyles , Graeme Andrew Kyle Roche , Jeffrey Allen Stokes , Carlos Minoru Sakoda , Dan Suciu
Abstract: The present disclosure relates clustering similar data records together in a hierarchical clustering scheme. Each tier in a cluster corresponds to a minimal match score, which reflects a degree of confidence. In this respect, a higher confidence may lead to smaller sized clusters while a lower confidence may lead to larger sized clusters. Ordinal classification may be used to generate hierarchical clusters. In some embodiments, hierarchical clustering with conflict resolution is used to resolve user-defined hard conflicts in each tier of the clustering results.
-
-
-
-
-
-
-
-