CLAICVAug 27, 2019

Is the Red Square Big? MALeViC: Modeling Adjectives Leveraging Visual Contexts

arXiv:1908.10285v11002 citations
AI Analysis

This addresses the challenge of understanding context-dependent language in AI, but it is incremental as it builds on existing multi-modal models without major breakthroughs.

This work tackles the problem of modeling gradable adjectives like 'big' and 'small' from visual contexts, showing that state-of-the-art multi-modal models can learn to assess object size in scenes but performance decreases with task complexity and they fail to develop abstract compositional representations.

This work aims at modeling how the meaning of gradable adjectives of size (`big', `small') can be learned from visually-grounded contexts. Inspired by cognitive and linguistic evidence showing that the use of these expressions relies on setting a threshold that is dependent on a specific context, we investigate the ability of multi-modal models in assessing whether an object is `big' or `small' in a given visual scene. In contrast with the standard computational approach that simplistically treats gradable adjectives as `fixed' attributes, we pose the problem as relational: to be successful, a model has to consider the full visual context. By means of four main tasks, we show that state-of-the-art models (but not a relatively strong baseline) can learn the function subtending the meaning of size adjectives, though their performance is found to decrease while moving from simple to more complex tasks. Crucially, models fail in developing abstract representations of gradable adjectives that can be used compositionally.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes