CV AIDec 15, 2021

Predicting Media Memorability: Comparing Visual, Textual and Auditory Features

Lorin Sweeney, Graham Healy, Alan F. Smeaton

arXiv:2112.07969v12.66 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of automatically predicting video memorability for media analysis, but it is incremental as it builds on previous submissions and focuses on comparative insights.

The paper tackled predicting video memorability by comparing visual, textual, and auditory features, achieving a short-term memorability score of 0.524 on the Memento10k dataset with a Bayesian Ridge Regressor using DenseNet121 features.

This paper describes our approach to the Predicting Media Memorability task in MediaEval 2021, which aims to address the question of media memorability by setting the task of automatically predicting video memorability. This year we tackle the task from a comparative standpoint, looking to gain deeper insights into each of three explored modalities, and using our results from last year's submission (2020) as a point of reference. Our best performing short-term memorability model (0.132) tested on the TRECVid2019 dataset -- just like last year -- was a frame based CNN that was not trained on any TRECVid data, and our best short-term memorability model (0.524) tested on the Memento10k dataset, was a Bayesian Ride Regressor fit with DenseNet121 visual features.

View on arXiv PDF

Similar