CVAIJun 26, 2025

Evidence-based diagnostic reasoning with multi-agent copilot for human pathology

arXiv:2506.20964v115 citationsh-index: 30
Originality Highly original
AI Analysis

This work addresses the problem of insufficient diagnostic reasoning capabilities in computational pathology for pathologists, representing a novel method rather than an incremental improvement.

The researchers tackled the limitations of multimodal large language models in computational pathology by introducing PathChat+, a model trained on over 1 million pathology-specific instruction samples, which substantially outperformed prior models, and SlideSeek, a multi-agent system that autonomously evaluates whole-slide images with high accuracy on a challenging differential diagnosis benchmark.

Pathology is experiencing rapid digital transformation driven by whole-slide imaging and artificial intelligence (AI). While deep learning-based computational pathology has achieved notable success, traditional models primarily focus on image analysis without integrating natural language instruction or rich, text-based context. Current multimodal large language models (MLLMs) in computational pathology face limitations, including insufficient training data, inadequate support and evaluation for multi-image understanding, and a lack of autonomous, diagnostic reasoning capabilities. To address these limitations, we introduce PathChat+, a new MLLM specifically designed for human pathology, trained on over 1 million diverse, pathology-specific instruction samples and nearly 5.5 million question answer turns. Extensive evaluations across diverse pathology benchmarks demonstrated that PathChat+ substantially outperforms the prior PathChat copilot, as well as both state-of-the-art (SOTA) general-purpose and other pathology-specific models. Furthermore, we present SlideSeek, a reasoning-enabled multi-agent AI system leveraging PathChat+ to autonomously evaluate gigapixel whole-slide images (WSIs) through iterative, hierarchical diagnostic reasoning, reaching high accuracy on DDxBench, a challenging open-ended differential diagnosis benchmark, while also capable of generating visually grounded, humanly-interpretable summary reports.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes