CL AI LG OTMay 27, 2025

Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making

Yihan Wang, Qiao Yan, Zhenghao Xing, Lihao Liu, Junjun He, Chi-Wing Fu, Xiaowei Hu, Pheng-Ann Heng

arXiv:2505.21503v16.72 citationsh-index: 21

Originality Incremental advance

AI Analysis

This addresses a critical issue in clinical AI by improving diagnostic accuracy in complex cases, though it is an incremental advancement in multi-agent reasoning methods.

The paper tackled the problem of Silent Agreement in multi-agent LLMs for clinical decision-making, where agents prematurely converge on diagnoses, and introduced a Catfish Agent to inject structured dissent, resulting in consistent performance improvements over single- and multi-agent frameworks, including leading models like GPT-4o and DeepSeek-R1, across multiple medical benchmarks.

Large language models (LLMs) have demonstrated strong potential in clinical question answering, with recent multi-agent frameworks further improving diagnostic accuracy via collaborative reasoning. However, we identify a recurring issue of Silent Agreement, where agents prematurely converge on diagnoses without sufficient critical analysis, particularly in complex or ambiguous cases. We present a new concept called Catfish Agent, a role-specialized LLM designed to inject structured dissent and counter silent agreement. Inspired by the ``catfish effect'' in organizational psychology, the Catfish Agent is designed to challenge emerging consensus to stimulate deeper reasoning. We formulate two mechanisms to encourage effective and context-aware interventions: (i) a complexity-aware intervention that modulates agent engagement based on case difficulty, and (ii) a tone-calibrated intervention articulated to balance critique and collaboration. Evaluations on nine medical Q&A and three medical VQA benchmarks show that our approach consistently outperforms both single- and multi-agent LLMs frameworks, including leading commercial models such as GPT-4o and DeepSeek-R1.

View on arXiv PDF

Similar