CVCLMay 28, 2023

KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of Vision-Language Models

arXiv:2305.18373v1224 citations
Originality Synthesis-oriented
AI Analysis

This work addresses image ad understanding, which is broadly relevant to the advertising industry, but it is incremental as it adapts existing models rather than introducing a new paradigm.

The paper tackles the problem of image ad understanding by benchmarking pre-trained vision-language models and proposing a feature adaptation strategy to fuse multimodal information and incorporate real-world entity knowledge, achieving improved performance on this under-explored task.

Image ad understanding is a crucial task with wide real-world applications. Although highly challenging with the involvement of diverse atypical scenes, real-world entities, and reasoning over scene-texts, how to interpret image ads is relatively under-explored, especially in the era of foundational vision-language models (VLMs) featuring impressive generalizability and adaptability. In this paper, we perform the first empirical study of image ad understanding through the lens of pre-trained VLMs. We benchmark and reveal practical challenges in adapting these VLMs to image ad understanding. We propose a simple feature adaptation strategy to effectively fuse multimodal information for image ads and further empower it with knowledge of real-world entities. We hope our study draws more attention to image ad understanding which is broadly relevant to the advertising industry.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes