VISUAL QUESTION ANSWERING USING VISUAL KNOWLEDGE BASES

    公开(公告)号:US20210109956A1

    公开(公告)日:2021-04-15

    申请号:US16650853

    申请日:2018-01-30

    Abstract: An example apparatus for visual question answering includes a receiver to receive an input image and a question. The apparatus also includes an encoder to encode the input image and the question into a query representation including visual attention features. The apparatus includes a knowledge spotter to retrieve a knowledge entry from a visual knowledge base pre-built on a set of question-answer pairs. The apparatus further includes a joint embedder to jointly embed the visual attention features and the knowledge entry to generate visual-knowledge features. The apparatus also further includes an answer generator to generate an answer based on the query representation and the visual-knowledge features.

    Visual question answering using visual knowledge bases

    公开(公告)号:US11663249B2

    公开(公告)日:2023-05-30

    申请号:US16650853

    申请日:2018-01-30

    CPC classification number: G06F16/3329 G06N3/045 G06N3/049 G06N5/025

    Abstract: An example apparatus for visual question answering includes a receiver to receive an input image and a question. The apparatus also includes an encoder to encode the input image and the question into a query representation including visual attention features. The apparatus includes a knowledge spotter to retrieve a knowledge entry from a visual knowledge base pre-built on a set of question-answer pairs. The apparatus further includes a joint embedder to jointly embed the visual attention features and the knowledge entry to generate visual-knowledge features. The apparatus also further includes an answer generator to generate an answer based on the query representation and the visual-knowledge features.

    Topic-guided model for image captioning system

    公开(公告)号:US11042782B2

    公开(公告)日:2021-06-22

    申请号:US16473898

    申请日:2017-03-20

    Abstract: Techniques are provided for training and operation of a topic-guided image captioning system. A methodology implementing the techniques according to an embodiment includes generating image feature vectors, for an image to be captioned, based on application of a convolutional neural network (CNN) to the image. The method further includes generating the caption based on application of a recurrent neural network (RNN) to the image feature vectors. The RNN is configured as a long short-term memory (LSTM) RNN. The method further includes training the LSTM RNN with training images and associated training captions. The training is based on a combination of: feature vectors of the training image; feature vectors of the associated training caption; and a multimodal compact bilinear (MCB) pooling of the training caption feature vectors and an estimated topic of the training image. The estimated topic is generated by an application of the CNN to the training image.

    Techniques for dense video descriptions

    公开(公告)号:US11263489B2

    公开(公告)日:2022-03-01

    申请号:US16616533

    申请日:2017-06-29

    Abstract: Techniques and apparatus for generating dense natural language descriptions for video content are described. In one embodiment, for example, an apparatus may include at least one memory and logic, at least a portion of the logic comprised in hardware coupled to the at least one memory, the logic to receive a source video comprising a plurality of frames, determine a plurality of regions for each of the plurality of frames, generate at least one region-sequence connecting the determined plurality of regions, apply a language model to the at least one region-sequence to generate description information comprising a description of at least a portion of content of the source video. Other embodiments are described and claimed.

Patent Agency Ranking