CVAug 27, 2023

MM-AU:Towards Multimodal Understanding of Advertisement Videos

arXiv:2308.14052v114 citationsh-index: 18
Originality Synthesis-oriented
AI Analysis

This work addresses the need for automated analysis of advertisement videos in e-commerce and awareness campaigns, though it is incremental as it builds on existing multimodal methods.

The authors tackled the problem of understanding advertisement videos by introducing a multimodal multilingual benchmark called MM-AU with over 8.4K videos, and they demonstrated that multimodal transformer-based models outperform unimodal approaches in tasks like topic categorization, tone transition, and social message detection.

Advertisement videos (ads) play an integral part in the domain of Internet e-commerce as they amplify the reach of particular products to a broad audience or can serve as a medium to raise awareness about specific issues through concise narrative structures. The narrative structures of advertisements involve several elements like reasoning about the broad content (topic and the underlying message) and examining fine-grained details involving the transition of perceived tone due to the specific sequence of events and interaction among characters. In this work, to facilitate the understanding of advertisements along the three important dimensions of topic categorization, perceived tone transition, and social message detection, we introduce a multimodal multilingual benchmark called MM-AU composed of over 8.4K videos (147 hours) curated from multiple web sources. We explore multiple zero-shot reasoning baselines through the application of large language models on the ads transcripts. Further, we demonstrate that leveraging signals from multiple modalities, including audio, video, and text, in multimodal transformer-based supervised models leads to improved performance compared to unimodal approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes