CVAug 9, 2022

Aesthetic Attributes Assessment of Images with AMANv2 and DPC-CaptionsV2

Xinghui Zhou, Xin Jin, Jianwen Lv, Heng Huang, Ming Mao, Shuai Cui

arXiv:2208.04522v11.4h-index: 6

Originality Incremental advance

AI Analysis

This work addresses the problem of limited datasets for aesthetic attribute assessment in image analysis, offering a semi-automatic approach to scale up data collection for researchers in computer vision and aesthetics.

The paper tackles the task of aesthetic attribute captioning for images by constructing a new dataset, DPC-CaptionsV2, and proposing an improved model, AMANv2, which outperforms previous methods like CNN-LSTM and AMAN in predicting comments on attributes such as composition, lighting, color, and subject.

Image aesthetic quality assessment is popular during the last decade. Besides numerical assessment, nature language assessment (aesthetic captioning) has been proposed to describe the generally aesthetic impression of an image. In this paper, we propose aesthetic attribute assessment, which is the aesthetic attributes captioning, i.e., to assess the aesthetic attributes such as composition, lighting usage and color arrangement. It is a non-trivial task to label the comments of aesthetic attributes, which limit the scale of the corresponding datasets. We construct a novel dataset, named DPC-CaptionsV2, by a semi-automatic way. The knowledge is transferred from a small-scale dataset with full annotations to large-scale professional comments from a photography website. Images of DPC-CaptionsV2 contain comments up to 4 aesthetic attributes: composition, lighting, color, and subject. Then, we propose a new version of Aesthetic Multi-Attributes Networks (AMANv2) based on the BUTD model and the VLPSA model. AMANv2 fuses features of a mixture of small-scale PCCD dataset with full annotations and large-scale DPCCaptionsV2 dataset with full annotations. The experimental results of DPCCaptionsV2 show that our method can predict the comments on 4 aesthetic attributes, which are closer to aesthetic topics than those produced by the previous AMAN model. Through the evaluation criteria of image captioning, the specially designed AMANv2 model is better to the CNN-LSTM model and the AMAN model.

View on arXiv PDF

Similar