The impact of multi-agent debate protocols on debate quality: a controlled case study
For researchers designing multi-agent debate systems, this work clarifies that protocol choice significantly impacts debate quality, with Rank-Adaptive Cross-Round being preferable when consensus is prioritized.
The study isolates the effect of debate protocols in multi-agent systems by comparing three protocols (Within-Round, Cross-Round, and Rank-Adaptive Cross-Round) against a no-interaction baseline in a macroeconomic case study. Results show that Rank-Adaptive Cross-Round achieves faster convergence, while Within-Round increases peer-referencing, revealing a trade-off between interaction and convergence.
In multi-agent debate (MAD) systems, performance gains are often reported; however, because the debate protocol (e.g., number of agents, rounds, and aggregation rule) is typically held fixed while model-related factors vary, it is difficult to disentangle protocol effects from model effects. To isolate these effects, we compare three main protocols, Within-Round (WR; agents see only current-round contributions), Cross-Round (CR; full prior-round context), and novel Rank-Adaptive Cross-Round (RA-CR; dynamically reorders agents and silences one per round via an external judge model), against a No-Interaction baseline (NI; independent responses without peer visibility). In a controlled macroeconomic case study (20 diverse events, five random seeds, matched prompts/decoding), RA-CR achieves faster convergence than CR, WR shows higher peer-referencing, and NI maximizes Argument Diversity (unaffected across the main protocols). These results reveal a trade-off between interaction (peer-referencing rate) and convergence (consensus formation), confirming protocol design matters. When consensus is prioritized, RA-CR outperforms the others.