LG CV NEMar 15, 2019

MFAS: Multimodal Fusion Architecture Search

Juan-Manuel Pérez-Rúa, Valentin Vielzeuf, Stéphane Pateux, Moez Baccouche, Frédéric Jurie

arXiv:1903.06496v126.9209 citations

Originality Highly original

AI Analysis

This work addresses the challenge of designing effective multimodal fusion architectures for classification tasks, offering a scalable solution that can be applied across different domains and dataset sizes.

The authors tackled the problem of finding optimal architectures for multimodal classification by proposing a generic search space and using an efficient sequential model-based exploration approach, discovering fusion architectures that achieve state-of-the-art performance on datasets like NTU RGB+D.

We tackle the problem of finding good architectures for multimodal classification problems. We propose a novel and generic search space that spans a large number of possible fusion architectures. In order to find an optimal architecture for a given dataset in the proposed search space, we leverage an efficient sequential model-based exploration approach that is tailored for the problem. We demonstrate the value of posing multimodal fusion as a neural architecture search problem by extensive experimentation on a toy dataset and two other real multimodal datasets. We discover fusion architectures that exhibit state-of-the-art performance for problems with different domain and dataset size, including the NTU RGB+D dataset, the largest multi-modal action recognition dataset available.

View on arXiv PDF

Similar