CVOct 8, 2023

Improving Discriminative Multi-Modal Learning with Large-Scale Pre-Trained Models

arXiv:2310.05193v12 citationsh-index: 8
Originality Incremental advance
AI Analysis

It addresses a bottleneck in multi-modal learning for researchers and practitioners, offering an incremental improvement over existing fine-tuning methods.

The paper tackles the problem of insufficient uni-modal feature learning in multi-modal models using large-scale pre-trained models, introducing MMLoRA to enhance inter-modal adaptation and improve performance across audio-visual, vision-language, and RGB-optical flow datasets.

This paper investigates how to better leverage large-scale pre-trained uni-modal models to further enhance discriminative multi-modal learning. Even when fine-tuned with only uni-modal data, these models can outperform previous multi-modal models in certain tasks. It's clear that their incorporation into multi-modal learning would significantly improve performance. However, multi-modal learning with these models still suffers from insufficient learning of uni-modal features, which weakens the resulting multi-modal model's generalization ability. While fine-tuning uni-modal models separately and then aggregating their predictions is straightforward, it doesn't allow for adequate adaptation between modalities, also leading to sub-optimal results. To this end, we introduce Multi-Modal Low-Rank Adaptation learning (MMLoRA). By freezing the weights of uni-modal fine-tuned models, adding extra trainable rank decomposition matrices to them, and subsequently performing multi-modal joint training, our method enhances adaptation between modalities and boosts overall performance. We demonstrate the effectiveness of MMLoRA on three dataset categories: audio-visual (e.g., AVE, Kinetics-Sound, CREMA-D), vision-language (e.g., MM-IMDB, UPMC Food101), and RGB-Optical Flow (UCF101).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes