AIApr 6, 2025

AGITB: A Signal-Level Benchmark for Evaluating Artificial General Intelligence

arXiv:2504.04430v8

Originality Highly original

AI Analysis

This provides a rigorous and interpretable benchmark for evaluating progress toward artificial general intelligence, addressing a foundational problem in AI research.

The paper tackles the lack of a unified measure for artificial general intelligence by introducing AGITB, a benchmark suite of fourteen elementary tests that evaluate models on forecasting temporal sequences without pretraining, and finds that no current AI system meets all criteria while the human cortex does.

Current AI systems demonstrate remarkable capabilities yet remain specialised, in part because no unified measure of general intelligence has been established. Existing evaluation frameworks, which focus primarily on language or perception tasks, offer limited insight into generality. The Artificial General Intelligence Testbed (AGITB) introduces a complementary benchmarking suite of fourteen elementary tests, with thirteen implemented as fully automated procedures. AGITB evaluates models on their ability to forecast the next input in a temporal sequence, step by step, without pretraining, symbolic manipulation, or semantic grounding. The framework isolates core computational invariants, such as determinism, sensitivity, and generalisation, that parallel principles of biological information processing. Designed to resist brute-force or memorisation-based strategies, AGITB enforces unbiased and autonomous learning. The human cortex satisfies all tests, whereas no current AI system meets the full AGITB criteria, demonstrating its value as a rigorous, interpretable, and actionable benchmark for evaluating progress toward artificial general intelligence. A reference implementation of AGITB is freely available on GitHub.

View on arXiv PDF

Similar