From homeostasis to resource sharing: Biologically and economically aligned multi-objective multi-agent gridworld-based AI safety benchmarks
This work provides new empirical testing grounds for AI safety researchers, addressing a gap in existing benchmarks by incorporating biologically and economically motivated themes.
This paper introduces eight multi-objective, multi-agent gridworld benchmarks for AI safety, focusing on themes like homeostasis, diminishing returns, sustainability, and resource sharing. These environments are designed to illustrate pitfalls such as unbounded maximization, over-optimization, and resource depletion.
Developing safe, aligned agentic AI systems requires comprehensive empirical testing, yet many existing benchmarks neglect crucial themes aligned with biology and economics, both time-tested fundamental sciences describing our needs and preferences. To address this gap, the present work focuses on introducing biologically and economically motivated themes that have been neglected in current mainstream discussions on AI safety - namely a set of multi-objective, multi-agent alignment benchmarks that emphasize homeostasis for bounded and biological objectives, diminishing returns for unbounded, instrumental, and business objectives, sustainability principle, and resource sharing. Eight main benchmark environments have been implemented on the above themes, to illustrate key pitfalls and challenges in agentic AI-s, such as unboundedly maximizing a homeostatic objective, over-optimizing one objective at the expense of others, neglecting safety constraints, or depleting shared resources.