AI HCMay 21

Can AI Make Conflicts Worse? An Alignment Failure in LLM Deployment Across Conflict Contexts

arXiv:2605.227206.3

Predicted impact top 94% in AI · last 90 daysOriginality Incremental advance

AI Analysis

For developers and deployers of LLMs, this work highlights a critical alignment failure in conflict settings that can deepen societal divisions, making model choice a safety issue.

The paper tests nine AI model configurations on 90 scenarios designed to surface misaligned behavior in conflict contexts, finding failure rates from 6% to 47%, with five configurations failing 80-100% when users pushed for 'balance' in cases with international court rulings. The authors release the first evaluation framework for this domain.

AI models are already deployed in societies affected by armed conflict, and journalists, humanitarian workers, governments and ordinary citizens rely on them for information or for their work processes. No established practice exists for checking whether their outputs can make those conflicts worse. We tested nine model configurations from four providers (OpenAI, Anthropic, DeepSeek, xAI) on 90 multi-turn scenarios designed to surface misaligned behaviour in conflict contexts: false equivalence between documented atrocities, denial of genocide, and failure to recognise ethnic slurs, among others. When such outputs feed into journalism, humanitarian reporting, or public debate, they can deepen divisions in fragile societies. Failure rates span 6\% to 47\% between the best and worst performing models, which makes model choice a safety question in its own right and when users pushed for ``balance'' in cases where international courts have already assigned responsibility, five of nine configurations failed 80 to 100 percent of the time. We release the first evaluation framework for this domain and propose adding it to alignment evaluation portfolios.

View on arXiv PDF

Similar