Abstract:
Methods, systems and computer program product embodiments for hashing techniques for determining similarity between data sets are described herein. A method embodiment includes, initializing a random number generator with a weighted min-hash value as a seed, wherein the weighted min-hash value approximates a similarity distance between data sets. A number of bits in the weighted min-hash value is determined by uniformly sampling an integer bit value using the random number generator. A system embodiment includes a repository configured to store a plurality of data sets and a hash generator configured to generate weighted min-hash values from the data sets. The system further includes a similarity determiner configured to determine a similarity between the data sets.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating image search results. One of the methods includes receiving first image search results responsive to a text query, each first image search result associated with a respective first score indicating a relevance of an image represented by the first image search result to the text query. Second image search results responsive to a query image are received, each second image search result associated with a respective second score indicating a measure of similarity between an image represented by the second image search result and the query image. A set of final image search results is selected including combining first scores and second scores of the selected first image search results. The final image search results are ordered by similarity to the query image.
Abstract:
Systems and methods for facilitating media fingerprinting are provided. In one aspect, a system can include: a memory, a microprocessor, a communication component that receives media; and a media fingerprinting component that fingerprints the media. The media fingerprinting component employs a fingerprint generation component stored in the memory and includes: a first hash generation component that generates sets of hashes corresponding to versions of the media; and a second hash generation component that computes a final hash based, at least, on hashing the sets of hashes. In some aspects, the media fingerprinting component can generate a flip-resistant fingerprint based, at least, on the final hash. In some aspects, the flip-resistant fingerprint is the final hash.
Abstract:
This disclosure relates to transformation invariant media matching. A fingerprinting component can generate a transformation invariant identifier for media content by adaptively encoding the relative ordering of interest points in media content. The interest points can be grouped into subsets, and stretch invariant descriptors can be generated for the subsets based on ratios of coordinates of interest points included in the subsets. The stretch invariant descriptors can be aggregated into a transformation invariant identifier. An identification component compares the identifier against a set of identifiers for known media content, and the media content can be matched or identified as a function of the comparison.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training scoring models. One method includes storing data identifying a plurality of positive and a plurality of negative training images for a query. The method further includes selecting a first image from either the positive group of images or the negative group of images, and applying a scoring model to the first image. The method further includes selecting a plurality of candidate images from the other group of images, applying the scoring model to each of the candidate images, and then selecting a second image from the candidate images according to scores for the images. The method further includes determining that the scores for the first image and the second image fail to satisfy a criterion, updating the scoring model, and storing the updated scoring model.
Abstract:
A method and system of identity masking to obscure identities corresponding to face regions in an image is disclosed. A face detector is applied to detect a set of possible face regions in the image. Then an identity masker is used to process the detected face regions by identity masking techniques in order to obscure identities corresponding to the regions. For example, a detected face region can be blurred as if it is in motion by a motion blur algorithm, such that the blurred region can not be recognized as the original identity. Or the detected face region can be replaced by a substitute facial image by a face replacement algorithm to obscure the corresponding identity.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving a hash vector r, a vector of locality-sensitive hash values, each hash value being an element of the hash vector r, each element having an index position; and generating a compact vector v corresponding to the hash vector r, wherein the compact vector v is a vector of compact elements each having an index position, wherein each compact element corresponds to the element of the hash vector r having the same index position, and wherein each compact element is a b-bit integer selected from the set of all b-bit integers {0, 1, . . . , 2b−1} based on the corresponding hash element.
Abstract:
A method and an apparatus estimate an object part location in a digital image using feature value analysis. The method according to one embodiment accesses digital image data representing a region including an object part of a digital image; accesses reference data including class data of classes relating to predetermined positions of the object part in predetermined regions, and features that discriminate among the classes; calculates feature values for the features in the region using pixel values within the region; and determines a location estimate of the object part using the feature values and the reference data.
Abstract:
A method and an apparatus automatically recognize or verify objects in a digital image using probability models. According to a first aspect, a method and apparatus automatically recognize or verify objects in a digital image by: accessing digital image data including an object of interest therein; detecting an object of interest in the image; normalizing the object to generate a normalized object representation; extracting a plurality of features from the normalized object representation; and applying each feature to a previously-determined additive probability model to determine the likelihood that the object of interest belongs to an existing class. In one embodiment, the previously-determined additive probability model is an Additive Gaussian Model.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an image ranking model to rank images based on hashes of their contents using a lookup table. An image training set is received. An image ranking model is trained with the training set by generating an image hash for each image of the ordered pair of images based on one or more features extracted from the image, computing a first score for a first image hash of a first image of the pair and a second score for a second image hash of a second image of the pair using the image ranking model, determining whether to update the image ranking model based on the first score and the second score, and updating the image ranking model using an update value based on the first score and the second score.