LGAIMMFeb 1, 2021

Multi-modal Ensemble Models for Predicting Video Memorability

arXiv:2102.01173v15 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of predicting media memorability for applications in content creation and recommendation, though it builds incrementally on existing techniques.

The paper tackled video memorability prediction by developing multi-modal ensemble methods using video, image, text, and audio features, demonstrating that audio embeddings significantly improve generalizability in the MediaEval2020 benchmark.

Modeling media memorability has been a consistent challenge in the field of machine learning. The Predicting Media Memorability task in MediaEval2020 is the latest benchmark among similar challenges addressing this topic. Building upon techniques developed in previous iterations of the challenge, we developed ensemble methods with the use of extracted video, image, text, and audio features. Critically, in this work we introduce and demonstrate the efficacy and high generalizability of extracted audio embeddings as a feature for the task of predicting media memorability.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes