Prism: A Plug-in Reproducible Infrastructure for Scalable Multimodal Continual Instruction Tuning

Jun-Tao Tang, Yu-Cheng Shi, Zhen-Hao Xie, Da-Wei Zhou

arXiv:2605.2611057.63 citationsHas Code

AI Analysis

For researchers in multimodal continual learning, Prism addresses the engineering bottleneck of implementing and comparing MCIT methods by providing a standardized, plug-in framework.

Prism provides a plug-in reproducible codebase for multimodal continual instruction tuning (MCIT), separating algorithmic development from backbone implementation to reduce engineering overhead and enable fair comparison. It supports large-scale training pipelines for scalable experimentation.

Multimodal Large Language Models (MLLMs) achieve versatility by reformulating diverse tasks into a unified instruction-following framework via instruction tuning. However, real-world deployment requires continuous adaptation to emerging tasks, motivating Multimodal Continual Instruction Tuning (MCIT). Despite its growing importance, current MCIT research is hindered by severe engineering bottlenecks. Existing methods are typically implemented by directly modifying the base MLLM codebase, which imposes substantial implementation overhead and yields method-specific architectures that severely limit code reuse and fair comparison. To address this, we introduce Prism, a plug-in reproducible codebase specifically designed for scalable MCIT research. It separates algorithmic development from the backbone implementation via a lightweight plugin registration mechanism, enabling new strategies to be integrated as independent plugins without modifying the underlying MLLM codebase, thereby eliminating structural fragmentation and accelerating method development. Prism natively supports widely used large-scale training pipeline, thereby enabling reproducible and scalable MCIT experimentation. Code is available at https://github.com/LAMDA-CL/Prism.

View on arXiv PDF Code

Similar