More cat than cute? Interpretable Prediction of Adjective-Noun Pairs
This work addresses the need for interpretable sentiment analysis in multimedia for researchers and practitioners, but it is incremental as it builds on existing ANP prediction methods.
The paper tackled the problem of predicting adjective-noun pairs (ANPs) in visual content by disentangling the contributions of adjectives and nouns, resulting in a more interpretable model that predicts 553 ANPs.
The increasing availability of affect-rich multimedia resources has bolstered interest in understanding sentiment and emotions in and from visual content. Adjective-noun pairs (ANP) are a popular mid-level semantic construct for capturing affect via visually detectable concepts such as "cute dog" or "beautiful landscape". Current state-of-the-art methods approach ANP prediction by considering each of these compound concepts as individual tokens, ignoring the underlying relationships in ANPs. This work aims at disentangling the contributions of the `adjectives' and `nouns' in the visual prediction of ANPs. Two specialised classifiers, one trained for detecting adjectives and another for nouns, are fused to predict 553 different ANPs. The resulting ANP prediction model is more interpretable as it allows us to study contributions of the adjective and noun components. Source code and models are available at https://imatge-upc.github.io/affective-2017-musa2/ .