IR AIAug 10, 2025

A Multi-Agent Approach to Neurological Clinical Reasoning

Moran Sorka, Alon Gorenshtein, Dvir Aran, Shahar Shelly

arXiv:2508.14063v12 citationsh-index: 6PLOS Digital Health

Originality Incremental advance

AI Analysis

This addresses the challenge of AI assistance in complex clinical contexts like neurology, offering a promising direction for enhancing medical reasoning, though it appears incremental as an extension of multi-agent approaches to a specific domain.

The researchers tackled the problem of evaluating and improving large language models' ability to handle specialized neurological reasoning by developing a benchmark with 305 neurology exam questions and testing various approaches. Their novel multi-agent system, which decomposes reasoning into specialized cognitive functions, dramatically improved performance, with LLaMA 3.3-70B achieving 89.2% accuracy versus 69.5% for its base model, especially on complex questions.

Large language models (LLMs) have shown promise in medical domains, but their ability to handle specialized neurological reasoning requires systematic evaluation. We developed a comprehensive benchmark using 305 questions from Israeli Board Certification Exams in Neurology, classified along three complexity dimensions: factual knowledge depth, clinical concept integration, and reasoning complexity. We evaluated ten LLMs using base models, retrieval-augmented generation (RAG), and a novel multi-agent system. Results showed significant performance variation. OpenAI-o1 achieved the highest base performance (90.9% accuracy), while specialized medical models performed poorly (52.9% for Meditron-70B). RAG provided modest benefits but limited effectiveness on complex reasoning questions. In contrast, our multi-agent framework, decomposing neurological reasoning into specialized cognitive functions including question analysis, knowledge retrieval, answer synthesis, and validation, achieved dramatic improvements, especially for mid-range models. The LLaMA 3.3-70B-based agentic system reached 89.2% accuracy versus 69.5% for its base model, with substantial gains on level 3 complexity questions. The multi-agent approach transformed inconsistent subspecialty performance into uniform excellence, addressing neurological reasoning challenges that persisted with RAG enhancement. We validated our approach using an independent dataset of 155 neurological cases from MedQA. Results confirm that structured multi-agent approaches designed to emulate specialized cognitive processes significantly enhance complex medical reasoning, offering promising directions for AI assistance in challenging clinical contexts.

View on arXiv PDF

Similar