AI MANov 6, 2025

When Empowerment Disempowers

Claire Yang, Maya Cakmak, Max Kleiman-Weiner

arXiv:2511.04177v13.3h-index: 2Has Code

Originality Incremental advance

AI Analysis

This work highlights a critical alignment challenge for AI in social environments like homes and hospitals, revealing that seemingly benign objectives can cause misalignment when scaled to multi-agent contexts, which is an incremental but important finding.

The paper tackles the problem that empowerment-based AI assistance, which aims to help one human, can harm others in multi-human settings, showing through a gridworld test suite that it reduces another human's influence and rewards by up to 40% in some scenarios.

Empowerment, a measure of an agent's ability to control its environment, has been proposed as a universal goal-agnostic objective for motivating assistive behavior in AI agents. While multi-human settings like homes and hospitals are promising for AI assistance, prior work on empowerment-based assistance assumes that the agent assists one human in isolation. We introduce an open source multi-human gridworld test suite Disempower-Grid. Using Disempower-Grid, we empirically show that assistive RL agents optimizing for one human's empowerment can significantly reduce another human's environmental influence and rewards - a phenomenon we formalize as disempowerment. We characterize when disempowerment occurs in these environments and show that joint empowerment mitigates disempowerment at the cost of the user's reward. Our work reveals a broader challenge for the AI alignment community: goal-agnostic objectives that seem aligned in single-agent settings can become misaligned in multi-agent contexts.

View on arXiv PDF

Similar