CVMar 27, 2025

Visual Jenga: Discovering Object Dependencies via Counterfactual Inpainting

Berkeley
arXiv:2503.21770v110 citationsh-index: 111
Originality Incremental advance
AI Analysis

This addresses scene understanding for computer vision researchers, but it is incremental as it builds on existing inpainting techniques.

The paper introduces Visual Jenga, a scene understanding task that involves removing objects from images to reveal dependencies, and proposes a data-driven, training-free method using counterfactual inpainting to quantify object relationships.

This paper proposes a novel scene understanding task called Visual Jenga. Drawing inspiration from the game Jenga, the proposed task involves progressively removing objects from a single image until only the background remains. Just as Jenga players must understand structural dependencies to maintain tower stability, our task reveals the intrinsic relationships between scene elements by systematically exploring which objects can be removed while preserving scene coherence in both physical and geometric sense. As a starting point for tackling the Visual Jenga task, we propose a simple, data-driven, training-free approach that is surprisingly effective on a range of real-world images. The principle behind our approach is to utilize the asymmetry in the pairwise relationships between objects within a scene and employ a large inpainting model to generate a set of counterfactuals to quantify the asymmetry.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes