Image captioning augmented with understanding of the surrounding text
Abstract:
To augment an image caption, a caption graph containing entity nodes corresponding to entities contained in the image and relationship edges between entity nodes corresponding to relationships between entities as illustrated in the image is generated. In addition, a contextual graph containing one or more of entity nodes corresponding to entities contained in the image and described in text associated with the image, textual entity nodes corresponding to textual entities described in text associated with the image and textual relationship edges between entity node pairs, textual entity node pairs and entity node and textual entity node pairs is generated. The textual relationship edges correspond to relationships described in the text associated with the image between entity pairs, textual entity pairs or entity and textual entity pairs. From the contextual graph, an augmented caption graph containing entity nodes, relationship edges, textual entities and textual relationship edges is generated.
Information query
Patent Agency Ranking
0/0