Automatic Understanding of Image and Video Advertisements
This addresses the need for computer vision systems to interpret persuasive content in ads, though it is incremental as it focuses on dataset creation and baseline methods.
The paper tackles the problem of automatic advertisement understanding by proposing a novel task and creating two datasets: an image dataset of 64,832 ads and a video dataset of 3,477 ads, with baseline classification results for tasks like answering questions about ad messages.
There is more to images than their objective physical content: for example, advertisements are created to persuade a viewer to take a certain action. We propose the novel problem of automatic advertisement understanding. To enable research on this problem, we create two datasets: an image dataset of 64,832 image ads, and a video dataset of 3,477 ads. Our data contains rich annotations encompassing the topic and sentiment of the ads, questions and answers describing what actions the viewer is prompted to take and the reasoning that the ad presents to persuade the viewer ("What should I do according to this ad, and why should I do it?"), and symbolic references ads make (e.g. a dove symbolizes peace). We also analyze the most common persuasive strategies ads use, and the capabilities that computer vision systems should have to understand these strategies. We present baseline classification results for several prediction tasks, including automatically answering questions about the messages of the ads.