The Focus-Aspect-Polarity Model for Predicting Subjective Noun Attributes in Images
This addresses the problem of fine-grained subjective interpretation in computer vision for researchers, though it is incremental as it builds on existing attribute prediction methods.
The paper tackles the challenge of predicting subjective noun attributes in images by proposing the Focus-Aspect-Polarity model and introducing a novel dataset, finding that context information via tensor multiplication outperforms concatenation in some cases.
Subjective visual interpretation is a challenging yet important topic in computer vision. Many approaches reduce this problem to the prediction of adjective- or attribute-labels from images. However, most of these do not take attribute semantics into account, or only process the image in a holistic manner. Furthermore, there is a lack of relevant datasets with fine-grained subjective labels. In this paper, we propose the Focus-Aspect-Polarity model to structure the process of capturing subjectivity in image processing, and introduce a novel dataset following this way of modeling. We run experiments on this dataset to compare several deep learning methods and find that incorporating context information based on tensor multiplication in several cases outperforms the default way of information fusion (concatenation).