MLCVLGDec 3, 2022

Visual Question Answering From Another Perspective: CLEVR Mental Rotation Tests

arXiv:2212.01639v110 citationsh-index: 54
Originality Incremental advance
AI Analysis

This addresses a challenging visual reasoning task for AI systems, with incremental improvements in handling viewpoint changes in synthetic datasets.

The paper tackles the problem of answering questions about a scene from a different viewpoint using only a single image, by creating the CLEVR-MRT dataset and exploring neural architectures that infer volumetric representations for manipulation. The result shows that volumetric representations are effective, with specific model variants outperforming standard methods in controlled experiments.

Different types of mental rotation tests have been used extensively in psychology to understand human visual reasoning and perception. Understanding what an object or visual scene would look like from another viewpoint is a challenging problem that is made even harder if it must be performed from a single image. We explore a controlled setting whereby questions are posed about the properties of a scene if that scene was observed from another viewpoint. To do this we have created a new version of the CLEVR dataset that we call CLEVR Mental Rotation Tests (CLEVR-MRT). Using CLEVR-MRT we examine standard methods, show how they fall short, then explore novel neural architectures that involve inferring volumetric representations of a scene. These volumes can be manipulated via camera-conditioned transformations to answer the question. We examine the efficacy of different model variants through rigorous ablations and demonstrate the efficacy of volumetric representations.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes