CVLGNov 28, 2025

Buffer replay enhances the robustness of multimodal learning under missing-modality

arXiv:2511.23070v1
Originality Incremental advance
AI Analysis

This addresses robustness issues for multimodal AI systems in real-world scenarios where data may be incomplete, though it is an incremental improvement over existing approaches.

The paper tackles performance degradation in multimodal models when modalities are missing by introducing REplay Prompting (REP), which caches and replays early-layer features and decouples private and shared representations, achieving consistent improvements over prior methods with minimal parameter overhead.

Missing modalities consistently lead to significant performance degradation in multimodal models. Existing approaches either synthesize missing modalities at high computational cost or apply prompt-based fine-tuning that relies only on adjacent-layer features and overlooks long-distance contextual information, which may offer additional tolerance to errors when one or more modalities are missing. To address this, we introduce REplay Prompting (REP): (1) construct modality-wise feature buffers via a residual bypass to cache early-layer representations and replay them in deeper layers, mitigating information loss as network depth increases; (2) employ a private-shared feature decoupling strategy, where private buffers preserve modality-specific signals and shared buffers encode cross-modal semantics; and (3) design a task-aware dynamic initialization mechanism to configure these buffers differently, improving stability and generalization under diverse missing-modality conditions. Experiments on vision-language, vision-language-audio, and temporal multimodal benchmarks demonstrate that REP consistently outperforms prior methods under both single- and multi-modality missing scenarios, while introducing only negligible parameter overhead. These results establish REP as a lightweight and effective paradigm for robust multimodal learning in challenging missing-modality environments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes