CVAICLLGDec 15, 2020

Attention over learned object embeddings enables complex visual reasoning

arXiv:2012.08508v385 citations
AI Analysis

This work is significant for researchers and practitioners in AI and machine learning who are developing more general and robust neural networks capable of complex visual reasoning, potentially reducing the need for task-specific architectures.

This paper addresses the challenge of combining perception and higher-level reasoning in neural networks. The authors propose a general neural-network-based approach that achieves state-of-the-art performance across three distinct visual reasoning domains, outperforming specialized modular approaches in each case.

Neural networks have achieved success in a wide array of perceptual tasks but often fail at tasks involving both perception and higher-level reasoning. On these more challenging tasks, bespoke approaches (such as modular symbolic components, independent dynamics models or semantic parsers) targeted towards that specific type of task have typically performed better. The downside to these targeted approaches, however, is that they can be more brittle than general-purpose neural networks, requiring significant modification or even redesign according to the particular task at hand. Here, we propose a more general neural-network-based approach to dynamic visual reasoning problems that obtains state-of-the-art performance on three different domains, in each case outperforming bespoke modular approaches tailored specifically to the task. Our method relies on learned object-centric representations, self-attention and self-supervised dynamics learning, and all three elements together are required for strong performance to emerge. The success of this combination suggests that there may be no need to trade off flexibility for performance on problems involving spatio-temporal or causal-style reasoning. With the right soft biases and learning objectives in a neural network we may be able to attain the best of both worlds.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes