AILGOct 31, 2025

ARC-GEN: A Mimetic Procedural Benchmark Generator for the Abstraction and Reasoning Corpus

arXiv:2511.00162v29.63 citationsh-index: 1Has Code
Originality Incremental advance
AI Analysis

This work addresses a bottleneck for researchers and developers using the ARC-AGI benchmark by providing a tool to generate more training examples, though it is incremental as it builds on existing benchmark efforts.

The paper tackles the limited demonstration set in the ARC-AGI benchmark by introducing ARC-GEN, an open-source procedural generator that extends the training dataset to cover all 400 tasks while closely mimicking the original distributional properties.

The Abstraction and Reasoning Corpus remains one of the most compelling and challenging benchmarks for tracking progress toward achieving Artificial General Intelligence. In contrast to other evaluation datasets designed to assess an agent's task-specific skills or accumulated knowledge, the ARC-AGI suite is specifically targeted at measuring skill acquisition efficiency, a trait that has (so far) been lacking in even the most sophisticated machine learning systems. For algorithms that require extensive intra-task exemplars, a significant constraint imposed by ARC-AGI is the modest cardinality of its demonstration set, comprising a small number of $\langle$ input, output $\rangle$ grids per task specifying the corresponding transformation. To embellish the space of viable sample pairs, this paper introduces ARC-GEN, an open-source procedural generator aimed at extending the original ARC-AGI training dataset as faithfully as possible. Unlike prior efforts, our generator is both exhaustive (covering all four-hundred tasks) and mimetic (more closely honoring the distributional properties and characteristics embodied in the initial ARC-AGI-1 release). We also discuss the use of this generator in establishing a static benchmark suite to verify the correctness of programs submitted to the 2025 Google Code Golf Championship.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes