LGMar 22

Active Inference Agency Formalization, Metrics, and Convergence Assessments

arXiv:2603.213192.5

Predicted impact top 99% in LG · last 90 daysOriginality Incremental advance

AI Analysis

It addresses AI safety concerns by providing tools to detect undesirable inner optimization in complex systems, though the approach appears incremental relative to existing active inference frameworks.

This paper tackles the problem of mesa-optimization in AI safety by formalizing agency as a continuous representation balancing curiosity and empowerment, showing the proposed function is smooth and convex with logarithmic convergence in sparse environments, suggesting high probability for spontaneous agency emergence in large models.

This paper addresses the critical challenge of mesa-optimization in AI safety by providing a formal definition of agency and a framework for its analysis. Agency is conceptualized as a Continuous Representation of accumulated experience that achieves autopoiesis through a dynamic balance between curiosity (minimizing prediction error to ensure non-computability and novelty) and empowerment (maximizing the control channel's information capacity to ensure subjectivity and goal-directedness). Empirical evidence suggests that this active inference-based model successfully accounts for classical instrumental goals, such as self-preservation and resource acquisition. The analysis demonstrates that the proposed agency function is smooth and convex, possessing favorable properties for optimization. While agentic functions occupy a vanishingly small fraction of the total abstract function space, they exhibit logarithmic convergence in sparse environments. This suggests a high probability for the spontaneous emergence of agency during the training of modern, large-scale models. To quantify the degree of agency, the paper introduces a metric based on the distance between the behavioral equivalents of a given system and an "ideal" agentic function within the space of canonicalized rewards (STARC). This formalization provides a concrete apparatus for classifying and detecting mesa-optimizers by measuring their proximity to an ideal agentic objective, offering a robust tool for analyzing and identifying undesirable inner optimization in complex AI systems.

View on arXiv PDF

Similar