Visualizing the Obvious: A Concreteness-based Ensemble Model for Noun Property Prediction
This addresses the problem of extracting obvious perceptual knowledge for NLP applications, but it is incremental as it builds on existing probing and ensemble techniques.
The paper tackled the challenge of predicting perceptual properties of nouns, which are rarely stated in text, by combining information from language models and images using an ensemble model calibrated by adjective concreteness scores. The results showed that this combination greatly improved noun property prediction compared to text-only models.
Neural language models encode rich knowledge about entities and their relationships which can be extracted from their representations using probing. Common properties of nouns (e.g., red strawberries, small ant) are, however, more challenging to extract compared to other types of knowledge because they are rarely explicitly stated in texts. We hypothesize this to mainly be the case for perceptual properties which are obvious to the participants in the communication. We propose to extract these properties from images and use them in an ensemble model, in order to complement the information that is extracted from language models. We consider perceptual properties to be more concrete than abstract properties (e.g., interesting, flawless). We propose to use the adjectives' concreteness score as a lever to calibrate the contribution of each source (text vs. images). We evaluate our ensemble model in a ranking task where the actual properties of a noun need to be ranked higher than other non-relevant properties. Our results show that the proposed combination of text and images greatly improves noun property prediction compared to powerful text-based language models.