Generation of training data to train a classifier to identify distinct physical user devices in a cross-device context
Abstract:
Techniques are disclosed for accurately identifying distinct physical user devices in a cross-device context. An example embodiment applies a multi-phase approach to generate labeled training datasets from a corpus of unlabeled device records. Such labeled training datasets can be used for training machine learning systems to predict the occurrence of device records that have been wrongly (or correctly, as the case may be) attributed to different physical user devices. Such identification of improper attribution can be particularly helpful in web-based analytics. The labeled training datasets include labeled pairs of device records generated using multiple strategies for inferring whether the two device records of a pair of device records represent the same physical user device (or different physical user devices). The labeled pairs of device records can then be used to train classifiers to predict with confidence whether two device records represent or do not represent the same physical user device.
Information query
Patent Agency Ranking
0/0