AICVMANov 11, 2025

How Modality Shapes Perception and Reasoning: A Study of Error Propagation in ARC-AGI

arXiv:2511.15717v1
Originality Incremental advance
AI Analysis

This addresses the challenge of error propagation in systematic generalization for AI researchers, but it is incremental as it builds on existing methods without introducing new paradigms.

The study tackled the problem of how different input modalities (text vs. images) affect perception and reasoning in ARC-AGI tasks, finding that structured text improves coordinate precision, images capture 2D shapes but are resolution-sensitive, and combining them enhances execution by about 8 perception points and 0.20 median similarity.

ARC-AGI and ARC-AGI-2 measure generalization-through-composition on small color-quantized grids, and their prize competitions make progress on these harder held-out tasks a meaningful proxy for systematic generalization. Recent instruction-first systems translate grids into concise natural-language or DSL rules executed in generate-execute-select loops, yet we lack a principled account of how encodings shape model perception and how to separate instruction errors from execution errors. We hypothesize that modality imposes perceptual bottlenecks -- text flattens 2D structure into 1D tokens while images preserve layout but can introduce patch-size aliasing -- thereby shaping which grid features are reliably perceived. To test this, we isolate perception from reasoning across nine text and image modalities using a weighted set-disagreement metric and a two-stage reasoning pipeline, finding that structured text yields precise coordinates on sparse features, images capture 2D shapes yet are resolution-sensitive, and combining them improves execution (about 8 perception points; about 0.20 median similarity). Overall, aligning representations with transformer inductive biases and enabling cross-validation between text and image yields more accurate instructions and more reliable execution without changing the underlying model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes