LGApr 10, 2023

On Robustness in Multimodal Learning

arXiv:2304.04385v24 citationsh-index: 59
Originality Incremental advance
AI Analysis

This addresses robustness issues in multimodal learning for applications on hardware platforms, but it is incremental as it builds on existing methods with specific improvements.

The paper tackled the problem of multimodal models behaving differently when modalities vary between training and deployment, proposing a robustness framework and interventions that achieved 1.5x-4x robustness improvements on datasets like AudioSet and Kinetics-400, with competitive results such as 44.2 mAP on AudioSet 20K.

Multimodal learning is defined as learning over multiple heterogeneous input modalities such as video, audio, and text. In this work, we are concerned with understanding how models behave as the type of modalities differ between training and deployment, a situation that naturally arises in many applications of multimodal learning to hardware platforms. We present a multimodal robustness framework to provide a systematic analysis of common multimodal representation learning methods. Further, we identify robustness short-comings of these approaches and propose two intervention techniques leading to $1.5\times$-$4\times$ robustness improvements on three datasets, AudioSet, Kinetics-400 and ImageNet-Captions. Finally, we demonstrate that these interventions better utilize additional modalities, if present, to achieve competitive results of $44.2$ mAP on AudioSet 20K.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes