Invention Grant
- Patent Title: Generating overlap estimations between high-volume digital data sets based on multiple sketch vector similarity estimators
-
Application No.: US17818974Application Date: 2022-08-10
-
Publication No.: US11720592B2Publication Date: 2023-08-08
- Inventor: Anup Rao , Tung Mai , Matvey Kapilevich
- Applicant: Adobe Inc.
- Applicant Address: US CA San Jose
- Assignee: Adobe Inc.
- Current Assignee: Adobe Inc.
- Current Assignee Address: US CA San Jose
- Agency: Keller Preece PLLC
- Main IPC: G06F16/00
- IPC: G06F16/00 ; G06F16/26 ; G06T11/20 ; G06F16/28

Abstract:
The present disclosure relates to systems, methods, and non-transitory computer-readable media that estimate the overlap between sets of data samples. In particular, in one or more embodiments, the disclosed systems utilize a sketch-based sampling routine and a flexible, accurate estimator to determine the overlap (e.g., the intersection) between sets of data samples. For example, in some implementations, the disclosed systems generate a sketch vector—such as a one permutation hashing vector—for each set of data samples. The disclosed systems further compare the sketch vectors to determine an equal bin similarity estimator, a lesser bin similarity estimator, and a greater bin similarity estimator. The disclosed systems utilize one or more of the determined similarity estimators in generating an overlap estimation for the sets of data samples.
Public/Granted literature
Information query