Method and system for generating a text summary for a multimedia content
Abstract:
The present disclosure relate to effectively determining spatial and temporal features for extracting low-level and high-level features from image frames of a multimedia content. A plurality of image frames are received from an imaging unit. Spatial filters are applied on each image frame to generate a first set of activation maps which provide spatial features in the image frames. Further, a temporal filter is applied on the plurality of image frames at a plurality of levels to generate one or more second set of activation maps corresponding to each level for determining temporal features in the image frames. Thereafter, the spatial feature from each image and temporal feature of the plurality of image frames from each level is extracted, which represent low-level and high-level spatial and temporal features.
Information query
Patent Agency Ranking
0/0