Machine-learning techniques for evaluating suitability of candidate datasets for target applications

    公开(公告)号:US11704598B2

    公开(公告)日:2023-07-18

    申请号:US17929394

    申请日:2022-09-02

    Applicant: Adobe Inc.

    CPC classification number: G06N20/00 G06F16/2264 G06F16/285

    Abstract: Techniques disclosed herein relate generally to evaluating and selecting candidate datasets for use by software applications, such as selecting candidate datasets for training machine-learning models used in software applications. Various machine-learning and other data science techniques are used to identify unique entities in a candidate dataset that are likely to be part of target entities for a software application. A merit attribute is then determined for the candidate dataset based on the number of unique entities that are likely to be part of the target entities, and weights associated with these unique entities. The merit attribute is used to identify the most efficient or most cost-effective candidate dataset for the software application.

    SEGMENTING USERS WITH SPARSE DATA UTILIZING HASH PARTITIONS

    公开(公告)号:US20220253463A1

    公开(公告)日:2022-08-11

    申请号:US17660328

    申请日:2022-04-22

    Applicant: Adobe Inc.

    Abstract: The present disclosure describes systems, non-transitory computer-readable media, and methods for utilizing hash partitions to determine local densities and distances among users (or among other represented data points) for clustering sparse data into segments. For instance, the disclosed systems can generate hash signatures for users in a sparse dataset and can map users to hash partitions based on the hash signatures. The disclosed systems can further determine local densities and separation distances for particular users (or other represented data points) within the hash partitions. Upon determining local densities and separation distances for datapoints from the dataset, the disclosed systems can select a segment (or cluster of data points) grouped according to a hierarchy of a clustering algorithm, such as a density-peaks-clustering algorithm.

    Customized geospatial population segmentation based on a received polygon definition

    公开(公告)号:US11355035B2

    公开(公告)日:2022-06-07

    申请号:US16534505

    申请日:2019-08-07

    Applicant: Adobe Inc.

    Abstract: Certain embodiments involve generating a real-time notification to facilitate the delivery of customized content, based on detecting that a subject activity occurs within a customized subject region. For instance, a computing system updates a graphical interface to display, as a layer on a map, detected instances of a subject activity performed within a geographic area of the map. The computing system determines a polygon corresponding to a region of the geographic area, where the polygon is determined from a graphical input representing lines at locations on the map on which the detected instances are overlaid. The computing system determines that a location of a detected instance of the subject activity performed by a user device falls within the polygon. The computing system transmits a notification to a content provider, such that the content provider delivers customized content to the user device that performed the detected instance of the subject activity.

    AUTOMATICALLY GENERATING USER SEGMENTS

    公开(公告)号:US20210311969A1

    公开(公告)日:2021-10-07

    申请号:US17245378

    申请日:2021-04-30

    Applicant: Adobe Inc.

    Abstract: Systems, methods, and non-transitory computer-readable media (systems) are disclosed for generating meaningful and insightful user segment reports based on a high dimensional data space. In particular, in one or more embodiments, the disclosed systems utilize a relaxed bi-clustering model to automatically identify user segments in a data space including datasets of features specific to individual users. In at least one embodiment, the disclosed systems identify and include users in automatically generated user segments even though those users are associated with some, but perhaps not all, of the features as other members in the automatically generated user segments.

    CUSTOMIZED GEOSPATIAL POPULATION SEGMENTATION BASED ON A RECEIVED POLYGON DEFINITION

    公开(公告)号:US20210043116A1

    公开(公告)日:2021-02-11

    申请号:US16534505

    申请日:2019-08-07

    Applicant: Adobe Inc.

    Abstract: Certain embodiments involve generating a real-time notification to facilitate the delivery of customized content, based on detecting that a subject activity occurs within a customized subject region. For instance, a computing system updates a graphical interface to display, as a layer on a map, detected instances of a subject activity performed within a geographic area of the map. The computing system determines a polygon corresponding to a region of the geographic area, where the polygon is determined from a graphical input representing lines at locations on the map on which the detected instances are overlaid. The computing system determines that a location of a detected instance of the subject activity performed by a user device falls within the polygon. The computing system transmits a notification to a content provider, such that the content provider delivers customized content to the user device that performed the detected instance of the subject activity.

    Identifying multiple devices belonging to a single user

    公开(公告)号:US10785134B2

    公开(公告)日:2020-09-22

    申请号:US16141374

    申请日:2018-09-25

    Applicant: Adobe Inc.

    Abstract: Techniques are disclosed that provide more accurate clustering of devices by forming clusters of devices and merging or changing clusters based on predetermined criteria. The technique starts with a large number of clusters (e.g., one for each account) and refines the clusters, for example, by merging clusters or determining which cluster a given device should be in when the device is associated with multiple clusters. One technique iteratively adjusts clusters of devices by merging clusters determined to be associated with a single user until a cluster contains all of the devices and accounts expected to be associated with a single user.

    Trait expansion techniques in binary matrix datasets

    公开(公告)号:US11899693B2

    公开(公告)日:2024-02-13

    申请号:US17677323

    申请日:2022-02-22

    Applicant: Adobe Inc.

    CPC classification number: G06F16/285

    Abstract: A cluster generation system identifies data elements, from a first binary record, that each have a particular value and correspond to respective binary traits. A candidate description function describing the binary traits is generated, the candidate description function including a model factor that describes the data elements. Responsive to determining that a second record has additional data elements having the particular value and corresponding to the respective binary traits, the candidate description function is modified to indicate that the model factor describes the additional elements. The candidate description function is also modified to include a correction factor describing an additional binary trait excluded from the respective binary traits. Based on the modified candidate description function, the cluster generation system generates a data summary cluster, which includes a compact representation of the binary traits of the data elements and additional data elements.

    Dynamic clustering of sparse data utilizing hash partitions

    公开(公告)号:US11328002B2

    公开(公告)日:2022-05-10

    申请号:US16852110

    申请日:2020-04-17

    Applicant: Adobe Inc.

    Abstract: The present disclosure describes systems, non-transitory computer-readable media, and methods for utilizing hash partitions to determine local densities and distances among users (or among other represented data points) for clustering sparse data into segments. For instance, the disclosed systems can generate hash signatures for users in a sparse dataset and can map users to hash partitions based on the hash signatures. The disclosed systems can further determine local densities and separation distances for particular users (or other represented data points) within the hash partitions. Upon determining local densities and separation distances for datapoints from the dataset, the disclosed systems can select a segment (or cluster of data points) grouped according to a hierarchy of a clustering algorithm, such as a density-peaks-clustering algorithm.

    Delivery of Contextual Interest from Interaction Information

    公开(公告)号:US20200065425A1

    公开(公告)日:2020-02-27

    申请号:US16107769

    申请日:2018-08-21

    Applicant: Adobe Inc.

    Abstract: Systems and techniques for delivery of contextual interest from interaction information are described that process user interactions with digital content to generate user interest scores for various topics. A contextual user interest system uses user interaction data to identify and contextualize content, and assigns propensity scores to the contextualized content. By dynamically contextualizing pages of content, the contextual user interest system may adapt to changes in the content and provide more accurate and robust information over time, which is not possible using conventional techniques. The contextualized pages of content are used to assign user interest scores across a number of topics to users who have visited the pages of content, and the user interest scores are normalized in a manner that allows a user's degree of interest in a topic to be compared to that of another user.

Patent Agency Ranking