MLLM-VADStory: Domain Knowledge-Driven Multimodal LLMs for Video Ad Storyline Insights
This work addresses the need for scalable insights into video ad storylines to guide advertisers in creative design, though it is incremental as it applies existing MLLM techniques with domain-specific adaptations.
The researchers tackled the problem of understanding video ad storylines at scale by developing MLLM-VADStory, a domain knowledge-guided multimodal LLM framework that segments ads into functional units and classifies them using an advertising-specific taxonomy; applying it to 50k social media video ads, they found that story-based creatives improve video retention and recommended top-performing story arcs.
We propose MLLM-VADStory, a novel domain knowledge-guided multimodal large language models (MLLM) framework to systematically quantify and generate insights for video ad storyline understanding at scale. The framework is centered on the core idea that ad narratives are structured by functional intent, with each scene unit performing a distinct communicative function, delivering product and brand-oriented information within seconds. MLLM-VADStory segments ads into functional units, classifies each unit's functionality using a novel advertising-specific functional role taxonomy, and then aggregates functional sequences across ads to recover data-driven storyline structures. Applying the framework to 50k social media video ads across four industry subverticals, we find that story-based creatives improve video retention, and we recommend top-performing story arcs to guide advertisers in creative design. Our framework demonstrates the value of using domain knowledge to guide MLLMs in generating scalable insights for video ad storylines, making it a versatile tool for understanding video creatives in general.