Learning Photography Aesthetics with Deep CNNs
This work addresses the need for interpretable aesthetic assessment in photography, providing insights into why photos are good or bad, which is incremental over existing single-score methods.
The paper tackles the problem of automatic photo aesthetic assessment by proposing a multitask deep CNN that jointly learns eight aesthetic attributes along with the overall aesthetic score, achieving near-human performance in score prediction.
Automatic photo aesthetic assessment is a challenging artificial intelligence task. Existing computational approaches have focused on modeling a single aesthetic score or a class (good or bad), however these do not provide any details on why the photograph is good or bad, or which attributes contribute to the quality of the photograph. To obtain both accuracy and human interpretation of the score, we advocate learning the aesthetic attributes along with the prediction of the overall score. For this purpose, we propose a novel multitask deep convolution neural network, which jointly learns eight aesthetic attributes along with the overall aesthetic score. We report near human performance in the prediction of the overall aesthetic score. To understand the internal representation of these attributes in the learned model, we also develop the visualization technique using back propagation of gradients. These visualizations highlight the important image regions for the corresponding attributes, thus providing insights about model's representation of these attributes. We showcase the diversity and complexity associated with different attributes through a qualitative analysis of the activation maps.