Method for QA with multi-modal information
Abstract:
Disclosed is a method for performing QA with multi-modal information. Specifically, according to the present disclosure, a computing device determines core text information from a video based on question data, determines core object information or core frame information from the video based on the core text information, and performs QA for the video based on the determined core text information, and the determined core object information or core frame information by utilizing a QA model.
Information query
Patent Agency Ranking
0/0