Which Ads to Show? Advertisement Image Assessment with Auxiliary Information via Multi-step Modality Fusion
This work addresses the need for better ad image assessment in online advertising by leveraging available auxiliary data, though it is incremental as it builds on existing modality fusion techniques.
The paper tackled the problem of predicting aesthetic preference for advertisement images by incorporating auxiliary information like tags and target subjects, achieving state-of-the-art performance on the AVA dataset and promising results on real-world ad data.
Assessing aesthetic preference is a fundamental task related to human cognition. It can also contribute to various practical applications such as image creation for online advertisements. Despite crucial influences of image quality, auxiliary information of ad images such as tags and target subjects can also determine image preference. Existing studies mainly focus on images and thus are less useful for advertisement scenarios where rich auxiliary data are available. Here we propose a modality fusion-based neural network that evaluates the aesthetic preference of images with auxiliary information. Our method fully utilizes auxiliary data by introducing multi-step modality fusion using both conditional batch normalization-based low-level and attention-based high-level fusion mechanisms, inspired by the findings from statistical analyses on real advertisement data. Our approach achieved state-of-the-art performance on the AVA dataset, a widely used dataset for aesthetic assessment. Besides, the proposed method is evaluated on large-scale real-world advertisement image data with rich auxiliary attributes, providing promising preference prediction results. Through extensive experiments, we investigate how image and auxiliary information together influence click-through rate.