AICLCVDec 17, 2024

MedMax: Mixed-Modal Instruction Tuning for Training Biomedical Assistants

arXiv:2412.12661v28 citationsh-index: 23Has Code
Originality Incremental advance
AI Analysis

This addresses the problem of limited data for developing unified biomedical assistants, though it is incremental as it builds on existing mixed-modal generative methods.

The authors tackled the lack of large-scale, diverse datasets for training biomedical AI assistants by creating MedMax, a 1.47-million-instance multimodal instruction-tuning dataset, which led to a 26% performance gain over Chameleon and 18.3% over GPT-4o on biomedical visual question-answering tasks.

Recent advancements in mixed-modal generative have opened new avenues for developing unified biomedical assistants capable of analyzing biomedical images, answering complex questions about them, and generating multimodal patient reports. However, existing datasets face challenges such as small sizes, limited coverage of biomedical tasks and domains, and a reliance on narrow sources. To address these gaps, we present MedMax, a large-scale multimodal biomedical instruction-tuning dataset for mixed-modal foundation models. With 1.47 million instances, MedMax encompasses a diverse range of tasks, including interleaved image-text generation, biomedical image captioning and generation, visual chat, and report understanding. These tasks span knowledge across diverse biomedical domains, including radiology and histopathology, grounded in medical papers and YouTube videos. Subsequently, we fine-tune a mixed-modal foundation model on the MedMax dataset, achieving significant performance improvements: a 26% gain over the Chameleon model and an 18.3% improvement over GPT-4o across 12 downstream biomedical visual question-answering tasks. Finally, we introduce a unified evaluation suite for biomedical tasks to guide the development of mixed-modal biomedical AI assistants. The data, model, and code is available at https://mint-medmax.github.io/.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes