AIMASep 15, 2025

Co-Alignment: Rethinking Alignment as Bidirectional Human-AI Cognitive Adaptation

arXiv:2509.12179v52 citationsh-index: 1
Originality Highly original
AI Analysis

This addresses the challenge of AI alignment for collaborative human-AI systems by proposing a bidirectional approach, representing a new paradigm rather than an incremental change.

The paper tackles the problem of AI alignment by shifting from a single-directional paradigm to co-alignment, where humans and AI mutually adapt, resulting in 85.5% success in collaborative navigation versus 70.3% baseline and improvements like 230% better mutual adaptation and 23% better safety.

Current AI alignment through RLHF follows a single directional paradigm that AI conforms to human preferences while treating human cognition as fixed. We propose a shift to co-alignment through Bidirectional Cognitive Alignment (BiCA), where humans and AI mutually adapt. BiCA uses learnable protocols, representation mapping, and KL-budget constraints for controlled co-evolution. In collaborative navigation, BiCA achieved 85.5% success versus 70.3% baseline, with 230% better mutual adaptation and 332% better protocol convergence. Emergent protocols outperformed handcrafted ones by 84%, while bidirectional adaptation unexpectedly improved safety (+23% out-of-distribution robustness). The 46% synergy improvement demonstrates optimal collaboration exists at the intersection, not union, of human and AI capabilities, validating the shift from single-directional to co-alignment paradigms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes