AIJun 10, 2025

ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm Engineering

arXiv:2506.09050v217 citationsh-index: 16
Originality Synthesis-oriented
AI Analysis

This addresses the problem of assessing AI in practical algorithm engineering for researchers and developers, though it is incremental as it builds on existing contest data.

The paper introduces ALE-Bench, a benchmark for evaluating AI systems on long-horizon, objective-driven algorithm engineering tasks based on real optimization problems, and finds that frontier LLMs show high performance on specific problems but lag behind humans in consistency and long-horizon capabilities.

How well do AI systems perform in algorithm engineering for hard optimization problems in domains such as package-delivery routing, crew scheduling, factory production planning, and power-grid balancing? We introduce ALE-Bench, a new benchmark for evaluating AI systems on score-based algorithmic programming contests. Drawing on real tasks from the AtCoder Heuristic Contests, ALE-Bench presents optimization problems that are computationally hard and admit no known exact solution. Unlike short-duration, pass/fail coding benchmarks, ALE-Bench encourages iterative solution refinement over long time horizons. Our software framework supports interactive agent architectures that leverage test-run feedback and visualizations. Our evaluation of frontier LLMs revealed that while they demonstrate high performance on specific problems, a notable gap remains compared to humans in terms of consistency across problems and long-horizon problem-solving capabilities. This highlights the need for this benchmark to foster future AI advancements.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes