-
公开(公告)号:US11797487B2
公开(公告)日:2023-10-24
申请号:US17715204
申请日:2022-04-07
Applicant: AMPERITY, INC.
Inventor: Stephen Meyles , Yan Yan , Dan Suciu , Michael P. Fikes
IPC: G06F7/02 , G06F16/00 , G06F16/174 , G06F16/28 , G06F16/22 , G06F40/197 , G06F17/16
CPC classification number: G06F16/1748 , G06F16/2272 , G06F16/285 , G06F40/197 , G06F16/288 , G06F17/16
Abstract: The present disclosure relates to optimizing one or more database tables that may include one or more redundant records. Records are clustered and assigned stable identifiers. In this manner, the underlying records within a cluster are not removed or deleted. As updates to the database are made, new clustering analyses are performed using the underlying records and any updates made. Newly identified clusters are reassigned stable identifiers.
-
公开(公告)号:US11972228B2
公开(公告)日:2024-04-30
申请号:US17930915
申请日:2022-09-09
Applicant: AMPERITY, INC.
Inventor: Derek Slager , Stephen Meyles , Yan Yan , Carlos Sakoda
IPC: G06F7/02 , G06F7/32 , G06F16/00 , G06F16/2455 , G06F7/14 , G06F16/215 , G06F16/23 , G06F16/24
CPC classification number: G06F7/32 , G06F16/24556 , G06F7/14 , G06F16/215 , G06F16/2365 , G06F16/24
Abstract: The present disclosure relates to merging database tables. Systems and methods may involve performing a comparison between the first set of records and the second set of records and identifying a plurality of record pairs based on the comparison. Each record pair may comprise a record in the first set of records and a record in the second set of records. In addition, A feature signature may be generated for each record pair by comparing field values in each record pair. The feature signature may be classified to identify at least one related record pair. A merged database table may be generated such that it comprises the at least one related record pair and comprises a set of unique records among selected from the first set of records and the second set of records.
-
公开(公告)号:US11442694B1
公开(公告)日:2022-09-13
申请号:US16787576
申请日:2020-02-11
Applicant: Amperity, Inc.
Inventor: Derek Slager , Stephen Meyles , Yan Yan , Carlos Sakoda
IPC: G06F7/02 , G06F16/00 , G06F7/32 , G06F16/2455 , G06F16/23 , G06F16/24 , G06F16/215 , G06F7/14
Abstract: The present disclosure relates to merging database tables. Systems and methods may involve performing a comparison between the first set of records and the second set of records and identifying a plurality of record pairs based on the comparison. Each record pair may comprise a record in the first set of records and a record in the second set of records. In addition, A feature signature may be generated for each record pair by comparing field values in each record pair. The feature signature may be classified to identify at least one related record pair. A merged database table may be generated such that it comprises the at least one related record pair and comprises a set of unique records among selected from the first set of records and the second set of records.
-
公开(公告)号:US11301426B1
公开(公告)日:2022-04-12
申请号:US16675789
申请日:2019-11-06
Applicant: Amperity, Inc.
Inventor: Stephen Meyles , Yan Yan , Dan Suciu , Michael P. Fikes
IPC: G06F7/02 , G06F16/00 , G06F16/174 , G06F16/22 , G06F16/28 , G06F40/197 , G06F17/16
Abstract: The present disclosure relates to optimizing one or more database tables that may include one or more redundant records. Records are clustered and assigned stable identifiers. In this manner, the underlying records within a cluster are not removed or deleted. As updates to the database are made, new clustering analyses are performed using the underlying records and any updates made. Newly identified clusters are reassigned stable identifiers.
-
公开(公告)号:US10599395B1
公开(公告)日:2020-03-24
申请号:US15729990
申请日:2017-10-11
Applicant: Amperity, Inc.
Inventor: Derek Slager , Stephen Meyles , Yan Yan , Carlos Sakoda
IPC: G06F7/02 , G06F16/00 , G06F7/32 , G06F16/2455 , G06F7/14 , G06F16/23 , G06F16/24 , G06F16/215
Abstract: The present disclosure relates to dynamically merging database tables according to user specified parameters. A user may specify a threshold confidence level that relates to a likelihood that two database records represent the same real-world entity. In addition, a user may specify a merge rule such as desired fields or a manner for consolidating the variations of the information in desired fields from the related records. The original database tables are preserved so that users can iteratively create new dynamically merged database tables by varying the parameters.
-
公开(公告)号:US11669301B2
公开(公告)日:2023-06-06
申请号:US17104868
申请日:2020-11-25
Applicant: AMPERITY, INC.
Inventor: Stephen Meyles , Yan Yan , Carlos Sakoda , Ian Wesley-Smith , Dan Suciu
IPC: G06F7/02 , G06F16/00 , G06F7/14 , G06F16/2455 , G06F16/215 , G06F16/23 , G06F16/242
CPC classification number: G06F7/14 , G06F16/215 , G06F16/2365 , G06F16/244 , G06F16/24556
Abstract: The present disclosure relates to fuse multiple database tables together. The fields of the database tables may be normalized using semantic fields. Under a first approach, database tables are deduplicated by consolidating redundant records. This may be done by performing pairwise comparisons to identify related pairs of records and then clustering the related pairs of records. Then, the deduplicated database tables are merged by performing another pairwise comparison. Under a second approach, the database tables may be concatenated. Thereafter, records are subject to pairwise comparisons and then clustered to create a merged database table.
-
公开(公告)号:US11308130B1
公开(公告)日:2022-04-19
申请号:US16678841
申请日:2019-11-08
Applicant: Amperity, Inc.
Inventor: Yan Yan , Stephen Meyles , Mona Akmal , Michael P. Fikes
Abstract: The present disclosure relates to evaluating whether two data records reflect the same entity using a classifier in the absence of ground truth. Without ground truth, it is difficult to determine the precision or recall of a classifier. The present disclosure generates a list comprising a series of unique feature signatures and a set of sample record pairs for each unique feature signature. In some embodiments, users may provide labels for the set of sample record pairs for each unique feature signature.
-
公开(公告)号:US10853033B1
公开(公告)日:2020-12-01
申请号:US15729931
申请日:2017-10-11
Applicant: Amperity, Inc.
Inventor: Stephen Meyles , Yan Yan , Carlos Sakoda , Ian Wesley-Smith , Dan Suciu
IPC: G06F7/02 , G06F16/00 , G06F7/14 , G06F16/2455 , G06F16/215 , G06F16/23 , G06F16/242
Abstract: The present disclosure relates to fuse multiple database tables together. The fields of the database tables may be normalized using semantic fields. Under a first approach, database tables are deduplicated by consolidating redundant records. This may be done by performing pairwise comparisons to identify related pairs of records and then clustering the related pairs of records. Then, the deduplicated database tables are merged by performing another pairwise comparison. Under a second approach, the database tables may be concatenated. Thereafter, records are subject to pairwise comparisons and then clustered to create a merged database table.
-
公开(公告)号:US10509809B1
公开(公告)日:2019-12-17
申请号:US15729960
申请日:2017-10-11
Applicant: Amperity, Inc.
Inventor: Yan Yan , Stephen Meyles , Mona Akmal , Michael P. Fikes
Abstract: The present disclosure relates to evaluating whether two data records reflect the same entity using a classifier in the absence of ground truth. Without ground truth, it is difficult to determine the precision or recall of a classifier. The present disclosure generates output data comprising a list of unique signatures generated from a set of records that are compared with each other. The output data may also comprise corresponding record pairs limited to a predetermined sample size for each unique feature signature.
-
-
-
-
-
-
-
-