Object count prediction using distributed processing
Abstract:
Techniques are provided for accurately and quickly processing distributed stored objects to provide a timely and accurate prediction of the number of live objects a parameterized file request will produce. Stored objects representing previous user webpage visit interactions are stored in different storage locations in a data store. The stored objects at each storage location are processed in parallel by hashing stored objects with a hash function such that they are spread somewhat uniformly into buckets. Sub-buckets in each bucket are formed that correspond to selected category identifiers. Also in parallel, K-minimum values are computed for each sub-bucket to estimate the count of stored objects in the data store. The K-minimum values for sub-buckets corresponding to the same category ID across all buckets are combined, in some cases harmonically, and used to generate a predicted number of live objects responsive to a parameterized file request.
Public/Granted literature
Information query
Patent Agency Ranking
0/0