Techniques for performing contextual phrase grounding
Abstract:
In various embodiments, a phrase grounding model automatically performs phrase grounding for a source sentence and a source image. The phrase grounding model determines that a first phrase included in the source sentence matches a first region of the source image based on the first phrase and at least a second phrase included in the source sentence. The phrase grounding model then generates a matched pair that specifies the first phrase and the first region. Subsequently, one or more annotation operations are performed on the source image based on the matched pair. Advantageously, the accuracy of the phrase grounding model is increased relative to prior art solutions where the interrelationships between phrases are typically disregarded.
Public/Granted literature
Information query
Patent Agency Ranking
0/0