CVLGSDASApr 5, 2022

Training-Free Robust Multimodal Learning via Sample-Wise Jacobian Regularization

arXiv:2204.02485v13 citationsh-index: 16
Originality Incremental advance
AI Analysis

This addresses robustness issues in multimodal learning for applications like audio-visual tasks, though it is incremental as it builds on existing fusion techniques.

The paper tackles the problem of robustness in multimodal fusion by proposing a training-free late-fusion method using Jacobian regularization, achieving improved performance under adversarial attacks and random corruptions as demonstrated on datasets like AV-MNIST, RAVDESS, and VGGsound.

Multimodal fusion emerges as an appealing technique to improve model performances on many tasks. Nevertheless, the robustness of such fusion methods is rarely involved in the present literature. In this paper, we propose a training-free robust late-fusion method by exploiting conditional independence assumption and Jacobian regularization. Our key is to minimize the Frobenius norm of a Jacobian matrix, where the resulting optimization problem is relaxed to a tractable Sylvester equation. Furthermore, we provide a theoretical error bound of our method and some insights about the function of the extra modality. Several numerical experiments on AV-MNIST, RAVDESS, and VGGsound demonstrate the efficacy of our method under both adversarial attacks and random corruptions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes