AIOct 3, 2025

Take Goodhart Seriously: Principled Limit on General-Purpose AI Optimization

arXiv:2510.02840v1h-index: 3
Originality Highly original
AI Analysis

This addresses a foundational problem in AI safety by highlighting risks of uncontrolled optimization in general-purpose systems, though it is incremental in building on existing mathematical results.

The paper argues that the Objective Satisfaction Assumption (OSA) fails in realistic conditions due to errors and inevitable misspecification, leading to systematic deviations from intended objectives that can collapse into Goodhart's law failure modes under optimization pressure, necessitating a principled limit on General-Purpose AI optimization to prevent loss of control.

A common but rarely examined assumption in machine learning is that training yields models that actually satisfy their specified objective function. We call this the Objective Satisfaction Assumption (OSA). Although deviations from OSA are acknowledged, their implications are overlooked. We argue, in a learning-paradigm-agnostic framework, that OSA fails in realistic conditions: approximation, estimation, and optimization errors guarantee systematic deviations from the intended objective, regardless of the quality of its specification. Beyond these technical limitations, perfectly capturing and translating the developer's intent, such as alignment with human preferences, into a formal objective is practically impossible, making misspecification inevitable. Building on recent mathematical results, absent a mathematical characterization of these gaps, they are indistinguishable from those that collapse into Goodhart's law failure modes under strong optimization pressure. Because the Goodhart breaking point cannot be located ex ante, a principled limit on the optimization of General-Purpose AI systems is necessary. Absent such a limit, continued optimization is liable to push systems into predictable and irreversible loss of control.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes