AIMar 24

ARC-AGI-3: A New Challenge for Frontier Agentic Intelligence

arXiv:2603.2462150.011 citations
Predicted impact top 73% in AI · last 90 daysOriginality Synthesis-oriented
AI Analysis

This provides a new challenge for assessing adaptive efficiency in AI agents, though it is incremental as it builds on prior ARC-AGI benchmarks.

The authors introduced ARC-AGI-3, an interactive benchmark for evaluating agentic intelligence in abstract, turn-based environments, where humans solve 100% of tasks while frontier AI systems score below 1% as of March 2026.

We introduce ARC-AGI-3, an interactive benchmark for studying agentic intelligence through novel, abstract, turn-based environments in which agents must explore, infer goals, build internal models of environment dynamics, and plan effective action sequences without explicit instructions. Like its predecessors ARC-AGI-1 and 2, ARC-AGI-3 focuses entirely on evaluating fluid adaptive efficiency on novel tasks, while avoiding language and external knowledge. ARC-AGI-3 environments only leverage Core Knowledge priors and are difficulty-calibrated via extensive testing with human test-takers. Our testing shows humans can solve 100% of the environments, in contrast to frontier AI systems which, as of March 2026, score below 1%. In this paper, we present the benchmark design, its efficiency-based scoring framework grounded in human action baselines, and the methodology used to construct, validate, and calibrate the environments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes