Robult: Leveraging Redundancy and Modality Specific Features for Robust Multimodal Learning
This addresses robustness challenges in multimodal learning for real-world applications, though it appears incremental as it builds on existing methods like contrastive learning and reconstruction.
The paper tackles the problem of missing modalities and limited labeled data in multimodal learning by proposing Robult, a framework that uses a soft Positive-Unlabeled contrastive loss and latent reconstruction loss, achieving superior performance over existing approaches in semi-supervised learning and missing modality contexts.
Addressing missing modalities and limited labeled data is crucial for advancing robust multimodal learning. We propose Robult, a scalable framework designed to mitigate these challenges by preserving modality-specific information and leveraging redundancy through a novel information-theoretic approach. Robult optimizes two core objectives: (1) a soft Positive-Unlabeled (PU) contrastive loss that maximizes task-relevant feature alignment while effectively utilizing limited labeled data in semi-supervised settings, and (2) a latent reconstruction loss that ensures unique modality-specific information is retained. These strategies, embedded within a modular design, enhance performance across various downstream tasks and ensure resilience to incomplete modalities during inference. Experimental results across diverse datasets validate that Robult achieves superior performance over existing approaches in both semi-supervised learning and missing modality contexts. Furthermore, its lightweight design promotes scalability and seamless integration with existing architectures, making it suitable for real-world multimodal applications.