Text-conditioned image search based on transformation, aggregation, and composition of visio-linguistic features
Abstract:
Techniques are disclosed for text-conditioned image searching. A methodology implementing the techniques includes decomposing a source image into visual feature vectors associated with different levels of granularity. The method also includes decomposing a text query (defining a target image attribute) into feature vectors associated with different levels of granularity including a global text feature vector. The method further includes generating image-text embeddings based on the visual feature vectors and the text feature vectors to encode information from visual and textual features. The method further includes composing a visio-linguistic representation based on a hierarchical aggregation of the image-text embeddings to encode visual and textual information at multiple levels of granularity. The method further includes identifying a target image that includes the visio-linguistic representation and the global text feature vector, so that the target image relates to the target image attribute, and providing the target image as an image search result.
Information query
Patent Agency Ranking
0/0