-
公开(公告)号:US20200372058A1
公开(公告)日:2020-11-26
申请号:US16941299
申请日:2020-07-28
Applicant: Sony Interactive Entertainment Inc.
Inventor: Jian Zheng , Ruxin Chen
IPC: G06F16/383 , G06F16/583 , G06K9/32 , G06N5/04
Abstract: For image captioning such as for computer game images or other images, bottom-up attention is combined with top-down attention to provide a multi-level residual attention-based image captioning model. A residual attention mechanism is first applied in the Faster R-CNN network to learn better feature representations for each region by taking spatial information into consideration. In the image captioning network, taking the extracted regional features as input, a second residual attention network is implemented to fuse the regional features attentionally for subsequent caption generation.
-
公开(公告)号:US11636673B2
公开(公告)日:2023-04-25
申请号:US16177214
申请日:2018-10-31
Applicant: Sony Interactive Entertainment Inc.
Inventor: Sudha Krishnamurthy , Justice Adams , Arindam Jati , Masanori Omote , Jian Zheng
Abstract: A system enhances existing audio-visual content with audio describing the setting of the visual content. A scene annotation module classifies scene elements from an image frame received from a host system and generates a caption describing the scene elements. A text to speech synthesis module may then convert the caption to synthesized speech data describing the scene elements within the image frame
-
公开(公告)号:US10726062B2
公开(公告)日:2020-07-28
申请号:US16206439
申请日:2018-11-30
Applicant: Sony Interactive Entertainment Inc.
Inventor: Jian Zheng , Ruxin Chen
IPC: G06F16/383 , G06F16/583 , G06K9/32 , G06N5/04
Abstract: For image captioning such as for computer game images or other images, bottom-up attention is combined with top-down attention to provide a multi-level residual attention-based image captioning model. A residual attention mechanism is first applied in the Faster R-CNN network to learn better feature representations for each region by taking spatial information into consideration. In the image captioning network, taking the extracted regional features as input, a second residual attention network is implemented to fuse the regional features attentionally for subsequent caption generation.
-
公开(公告)号:US11281709B2
公开(公告)日:2022-03-22
申请号:US16941299
申请日:2020-07-28
Applicant: Sony Interactive Entertainment Inc.
Inventor: Jian Zheng , Ruxin Chen
IPC: G06F16/383 , G06F16/583 , G06K9/32 , G06N5/04
Abstract: For image captioning such as for computer game images or other images, bottom-up attention is combined with top-down attention to provide a multi-level residual attention-based image captioning model. A residual attention mechanism is first applied in the Faster R-CNN network to learn better feature representations for each region by taking spatial information into consideration. In the image captioning network, taking the extracted regional features as input, a second residual attention network is implemented to fuse the regional features attentionally for subsequent caption generation.
-
公开(公告)号:US20230259553A1
公开(公告)日:2023-08-17
申请号:US18138620
申请日:2023-04-24
Applicant: Sony Interactive Entertainment Inc.
Inventor: Sudha Krishnamurthy , Justice Adams , Arindam Jati , Masanori Omote , Jian Zheng
IPC: G06F16/65 , A63F13/60 , G10L13/02 , G06V20/20 , G10L15/16 , G10L15/26 , G06N3/045 , G06V10/764 , G06V10/776 , G06V10/82 , G06V10/44 , G06V20/70 , G06V20/40
CPC classification number: G06F16/65 , A63F13/60 , G10L13/02 , G06V20/20 , G10L15/16 , G10L15/26 , G06N3/045 , G06V10/764 , G06V10/776 , G06V10/82 , G06V10/454 , G06V20/70 , G06V20/41 , G06V20/44
Abstract: A system enhances existing audio-visual content with an action a scene annotation module, an action description module, both of which are coupled to a controller. The scene annotation module classifies scene elements from an image frame received from a host system and generates a caption describing the scene elements. The scene annotation module includes a first neural network configured to generate a feature vector from the image frame and a second neural network configured to generate a caption describing elements within the image frame from the feature vector. The action description module recognizes action happening within one or more image frames received from the host system and generates a description of the action happening within one or more image frames.
-
公开(公告)号:US20200175053A1
公开(公告)日:2020-06-04
申请号:US16206439
申请日:2018-11-30
Applicant: Sony Interactive Entertainment Inc.
Inventor: Jian Zheng , Ruxin Chen
IPC: G06F16/383 , G06N5/04 , G06K9/32 , G06F16/583
Abstract: For image captioning such as for computer game images or other images, bottom-up attention is combined with top-down attention to provide a multi-level residual attention-based image captioning model. A residual attention mechanism is first applied in the Faster R-CNN network to learn better feature representations for each region by taking spatial information into consideration. In the image captioning network, taking the extracted regional features as input, a second residual attention network is implemented to fuse the regional features attentionally for subsequent caption generation.
-
-
-
-
-