CVNov 14, 2025

MAFM^3: Modular Adaptation of Foundation Models for Multi-Modal Medical AI

Mohammad Areeb Qazi, Munachiso S Nwadike, Ibrahim Almakky, Mohammad Yaqub, Numan Saeed

arXiv:2511.11212v13.6h-index: 9Has Code

Originality Incremental advance

AI Analysis

This work addresses the problem of data scarcity in medical AI by providing a modular adaptation method for foundation models, which is incremental as it builds on existing adaptation techniques but offers a more unified approach.

The paper tackles the challenge of adapting foundation models to diverse medical imaging tasks and modalities with limited data, proposing MAFM^3, a framework that uses lightweight modular components to enable a single model to handle multiple tasks and modalities, resulting in improved performance such as a 5% Dice score gain for PET scans.

Foundational models are trained on extensive datasets to capture the general trends of a domain. However, in medical imaging, the scarcity of data makes pre-training for every domain, modality, or task challenging. Instead of building separate models, we propose MAFM^3 (Modular Adaptation of Foundation Models for Multi-Modal Medical AI), a framework that enables a single foundation model to expand into diverse domains, tasks, and modalities through lightweight modular components. These components serve as specialized skill sets that allow the system to flexibly activate the appropriate capability at the inference time, depending on the input type or clinical objective. Unlike conventional adaptation methods that treat each new task or modality in isolation, MAFM^3 provides a unified and expandable framework for efficient multitask and multimodality adaptation. Empirically, we validate our approach by adapting a chest CT foundation model initially trained for classification into prognosis and segmentation modules. Our results show improved performance on both tasks. Furthermore, by incorporating PET scans, MAFM^3 achieved an improvement in the Dice score 5% compared to the respective baselines. These findings establish that foundation models, when equipped with modular components, are not inherently constrained to their initial training scope but can evolve into multitask, multimodality systems for medical imaging. The code implementation of this work can be found at https://github.com/Areeb2735/CTscan_prognosis_VLM

View on arXiv PDF Code

Similar