CLMay 30

Not All Flips Are Conformity: Decomposing Stance Convergence in Multi-Agent LLM Debate

Xiqi Hao, Zengqing Wu, Yu-Xuan Qiu, Chuan Xiao, Ruiqi Xu, Shuyuan Zheng, Jianbin Qin

arXiv:2606.0082049.2

AI Analysis

For researchers using multi-agent debate to improve LLM reasoning, this work reveals that convergence is often harmful conformity rather than genuine deliberation, challenging the assumption that debate reliably improves accuracy.

The paper shows that answer convergence in multi-agent LLM debate conflates spontaneous instability, conformity, and persuasion. Using a decomposition framework, they find 29% strict conformity (57-77% harmful) and that vacuous reasoning causes 20-39% error adoption; a targeted intervention reduces harmful conformity by 13.6 percentage points but cannot improve accuracy without labels.

Multi-agent debate (MAD) is a promising strategy for improving LLM reasoning, but when agents converge on a shared answer, it is unclear whether that convergence reflects genuine deliberation or social compliance. We show that the conventional answer flip rate conflates three distinct mechanisms: spontaneous instability, stance-induced conformity, and reasoning-induced persuasion. Our three-source decomposition framework isolates each through controlled counterfactual conditions. In the primary MMLU-Pro setting, 37% of agent-question observations change under self-reflection alone, while robustness tests show substantial model-dependent instability across GPQA-Diamond and three model families; strict conformity is 29% in the primary setting and remains predominantly harmful across model replications (57-77% correct-to-wrong). A controlled information-gradient experiment reveals that even vacuous reasoning is associated with 20-39% error adoption among resistant agents, with reasoning-like presentation carrying substantial persuasive weight. Harmful conformity can be predicted from Round 0 features (AUC = 0.79), and risk-targeted intervention reduces it by 13.6 percentage points (p < 0.001). However, without correctness labels or self-reflection controls, reducing peer adoption does not improve accuracy, because harmful and beneficial influence cannot be distinguished.

View on arXiv PDF

Similar