CVAIDec 15, 2021

Predicting Media Memorability: Comparing Visual, Textual and Auditory Features

arXiv:2112.07969v16 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of automatically predicting video memorability for media analysis, but it is incremental as it builds on previous submissions and focuses on comparative insights.

The paper tackled predicting video memorability by comparing visual, textual, and auditory features, achieving a short-term memorability score of 0.524 on the Memento10k dataset with a Bayesian Ridge Regressor using DenseNet121 features.

This paper describes our approach to the Predicting Media Memorability task in MediaEval 2021, which aims to address the question of media memorability by setting the task of automatically predicting video memorability. This year we tackle the task from a comparative standpoint, looking to gain deeper insights into each of three explored modalities, and using our results from last year's submission (2020) as a point of reference. Our best performing short-term memorability model (0.132) tested on the TRECVid2019 dataset -- just like last year -- was a frame based CNN that was not trained on any TRECVid data, and our best short-term memorability model (0.524) tested on the Memento10k dataset, was a Bayesian Ride Regressor fit with DenseNet121 visual features.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes