CLAIJul 2, 2024

MMedAgent: Learning to Use Medical Tools with Multi-modal Agent

arXiv:2407.02483v2120 citationsh-index: 5Has Code
AI Analysis

This addresses the need for more effective multi-modal AI in healthcare, offering an incremental improvement by adapting existing agent-based approaches to the medical domain.

The paper tackles the problem of limited generality in Multi-Modal Large Language Models (MLLMs) for medical tasks by introducing MMedAgent, the first agent designed for the medical field, which achieves superior performance compared to state-of-the-art open-source methods and GPT-4o across various tasks.

Multi-Modal Large Language Models (MLLMs), despite being successful, exhibit limited generality and often fall short when compared to specialized models. Recently, LLM-based agents have been developed to address these challenges by selecting appropriate specialized models as tools based on user inputs. However, such advancements have not been extensively explored within the medical domain. To bridge this gap, this paper introduces the first agent explicitly designed for the medical field, named \textbf{M}ulti-modal \textbf{Med}ical \textbf{Agent} (MMedAgent). We curate an instruction-tuning dataset comprising six medical tools solving seven tasks across five modalities, enabling the agent to choose the most suitable tools for a given task. Comprehensive experiments demonstrate that MMedAgent achieves superior performance across a variety of medical tasks compared to state-of-the-art open-source methods and even the closed-source model, GPT-4o. Furthermore, MMedAgent exhibits efficiency in updating and integrating new medical tools. Codes and models are all available.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes