-
公开(公告)号:US12242514B2
公开(公告)日:2025-03-04
申请号:US17316293
申请日:2021-05-10
Applicant: AMPERITY, INC.
Inventor: Yan Yan , Stephen Keith Meyles , Graeme Andrew Kyle Roche , Jeffrey Allen Stokes , Carlos Minoru Sakoda , Dan Suciu
Abstract: The present disclosure relates clustering similar data records together in a hierarchical clustering scheme. Each tier in a cluster corresponds to a minimal match score, which reflects a degree of confidence. In this respect, a higher confidence may lead to smaller sized clusters while a lower confidence may lead to larger sized clusters. Ordinal classification may be used to generate hierarchical clusters. In some embodiments, hierarchical clustering with conflict resolution is used to resolve user-defined hard conflicts in each tier of the clustering results.
-
公开(公告)号:US12198072B2
公开(公告)日:2025-01-14
申请号:US18390803
申请日:2023-12-20
Applicant: AMPERITY, INC.
Inventor: Yan Yan , Aria Haghighi , Nicholas Resnick , Andrew Lim
IPC: G06N5/04 , G06F16/23 , G06F16/24 , G06N20/00 , G06Q30/0201 , G06Q30/01 , G06Q30/0202
Abstract: Disclosed are techniques for generating features to train a predictive model to predict a customer lifetime value or churn rate. In one embodiment, a method is disclosed comprising receiving a record that includes a plurality of fields and selecting a value associated with a selected field in the plurality of fields. The method then queries a lookup table comprising a mapping of values to aggregated statistics using the value and receives an aggregated statistic based on the querying. Next, the method generates a feature vector by annotating the record with the aggregated statistic. The method uses this feature vector as an input to a predictive model.
-
公开(公告)号:US11442694B1
公开(公告)日:2022-09-13
申请号:US16787576
申请日:2020-02-11
Applicant: Amperity, Inc.
Inventor: Derek Slager , Stephen Meyles , Yan Yan , Carlos Sakoda
IPC: G06F7/02 , G06F16/00 , G06F7/32 , G06F16/2455 , G06F16/23 , G06F16/24 , G06F16/215 , G06F7/14
Abstract: The present disclosure relates to merging database tables. Systems and methods may involve performing a comparison between the first set of records and the second set of records and identifying a plurality of record pairs based on the comparison. Each record pair may comprise a record in the first set of records and a record in the second set of records. In addition, A feature signature may be generated for each record pair by comparing field values in each record pair. The feature signature may be classified to identify at least one related record pair. A merged database table may be generated such that it comprises the at least one related record pair and comprises a set of unique records among selected from the first set of records and the second set of records.
-
公开(公告)号:US11301426B1
公开(公告)日:2022-04-12
申请号:US16675789
申请日:2019-11-06
Applicant: Amperity, Inc.
Inventor: Stephen Meyles , Yan Yan , Dan Suciu , Michael P. Fikes
IPC: G06F7/02 , G06F16/00 , G06F16/174 , G06F16/22 , G06F16/28 , G06F40/197 , G06F17/16
Abstract: The present disclosure relates to optimizing one or more database tables that may include one or more redundant records. Records are clustered and assigned stable identifiers. In this manner, the underlying records within a cluster are not removed or deleted. As updates to the database are made, new clustering analyses are performed using the underlying records and any updates made. Newly identified clusters are reassigned stable identifiers.
-
公开(公告)号:US10599395B1
公开(公告)日:2020-03-24
申请号:US15729990
申请日:2017-10-11
Applicant: Amperity, Inc.
Inventor: Derek Slager , Stephen Meyles , Yan Yan , Carlos Sakoda
IPC: G06F7/02 , G06F16/00 , G06F7/32 , G06F16/2455 , G06F7/14 , G06F16/23 , G06F16/24 , G06F16/215
Abstract: The present disclosure relates to dynamically merging database tables according to user specified parameters. A user may specify a threshold confidence level that relates to a likelihood that two database records represent the same real-world entity. In addition, a user may specify a merge rule such as desired fields or a manner for consolidating the variations of the information in desired fields from the related records. The original database tables are preserved so that users can iteratively create new dynamically merged database tables by varying the parameters.
-
公开(公告)号:US12013855B2
公开(公告)日:2024-06-18
申请号:US18313753
申请日:2023-05-08
Applicant: AMPERITY, INC.
Inventor: Yan Yan , Aria Haghighi , Joseph Christianson
IPC: G06F16/2453 , G06F16/2457 , G06F16/28
CPC classification number: G06F16/24542 , G06F16/24578 , G06F16/285
Abstract: Disclosed are techniques for trimming large clusters of related records. In one embodiment, a method is disclosed comprising receiving a set of clusters, each cluster in the clusters including a plurality of records. The method extracts an oversized cluster in the set of clusters and performs a breadth-first search (BFS) on the oversized cluster to generate a list of visited records. The method terminates the BFS upon determining that the size of the list of visited records exceeds a maximum size and generates a new cluster from the list of visited records and adding the new cluster to the set of clusters. By recursively performing BFS traverse over the oversized cluster and extracting smaller new clusters from it, the oversized cluster is eventually partitioned into a set of sub-clusters with the size smaller than the predefined threshold.
-
公开(公告)号:US11797487B2
公开(公告)日:2023-10-24
申请号:US17715204
申请日:2022-04-07
Applicant: AMPERITY, INC.
Inventor: Stephen Meyles , Yan Yan , Dan Suciu , Michael P. Fikes
IPC: G06F7/02 , G06F16/00 , G06F16/174 , G06F16/28 , G06F16/22 , G06F40/197 , G06F17/16
CPC classification number: G06F16/1748 , G06F16/2272 , G06F16/285 , G06F40/197 , G06F16/288 , G06F17/16
Abstract: The present disclosure relates to optimizing one or more database tables that may include one or more redundant records. Records are clustered and assigned stable identifiers. In this manner, the underlying records within a cluster are not removed or deleted. As updates to the database are made, new clustering analyses are performed using the underlying records and any updates made. Newly identified clusters are reassigned stable identifiers.
-
公开(公告)号:US11003643B2
公开(公告)日:2021-05-11
申请号:US16399162
申请日:2019-04-30
Applicant: Amperity, Inc.
Inventor: Yan Yan , Stephen Keith Meyles , Graeme Andrew Kyle Roche , Jeffrey Allen Stokes , Carlos Minoru Sakoda , Dan Suciu
Abstract: The present disclosure relates clustering similar data records together in a hierarchical clustering scheme. Each tier in a cluster corresponds to a minimal match score, which reflects a degree of confidence. In this respect, a higher confidence may lead to smaller sized clusters while a lower confidence may lead to larger sized clusters. Ordinal classification may be used to generate hierarchical clusters. In some embodiments, hierarchical clustering with conflict resolution is used to resolve user-defined hard conflicts in each tier of the clustering results.
-
公开(公告)号:US11704315B1
公开(公告)日:2023-07-18
申请号:US16938233
申请日:2020-07-24
Applicant: Amperity, Inc.
Inventor: Yan Yan , Aria Haghighi , Joseph Christianson
IPC: G06F16/2453 , G06F16/28 , G06F16/2457
CPC classification number: G06F16/24542 , G06F16/285 , G06F16/24578
Abstract: Disclosed are techniques for trimming large clusters of related records. In one embodiment, a method is disclosed comprising receiving a set of clusters, each cluster in the clusters including a plurality of records. The method extracts an oversized cluster in the set of clusters and performs a breadth-first search (BFS) on the oversized cluster to generate a list of visited records. The method terminates the BFS upon determining that the size of the list of visited records exceeds a maximum size and generates a new cluster from the list of visited records and adding the new cluster to the set of clusters. By recursively performing BFS traverse over the oversized cluster and extracting smaller new clusters from it, the oversized cluster is eventually partitioned into a set of sub-clusters with the size smaller than the predefined threshold.
-
公开(公告)号:US11669301B2
公开(公告)日:2023-06-06
申请号:US17104868
申请日:2020-11-25
Applicant: AMPERITY, INC.
Inventor: Stephen Meyles , Yan Yan , Carlos Sakoda , Ian Wesley-Smith , Dan Suciu
IPC: G06F7/02 , G06F16/00 , G06F7/14 , G06F16/2455 , G06F16/215 , G06F16/23 , G06F16/242
CPC classification number: G06F7/14 , G06F16/215 , G06F16/2365 , G06F16/244 , G06F16/24556
Abstract: The present disclosure relates to fuse multiple database tables together. The fields of the database tables may be normalized using semantic fields. Under a first approach, database tables are deduplicated by consolidating redundant records. This may be done by performing pairwise comparisons to identify related pairs of records and then clustering the related pairs of records. Then, the deduplicated database tables are merged by performing another pairwise comparison. Under a second approach, the database tables may be concatenated. Thereafter, records are subject to pairwise comparisons and then clustered to create a merged database table.
-
-
-
-
-
-
-
-
-