CVMar 10, 2023

Single-branch Network for Multimodal Training

arXiv:2303.06129v132 citationsh-index: 22Has Code
Originality Incremental advance
AI Analysis

This addresses the need for more efficient and unified systems for handling multimedia data in applications like social media analysis, though it appears incremental as it builds on existing multimodal frameworks.

The paper tackles the problem of processing multimodal data like audio, images, and text for tasks such as cross-modal verification and matching by proposing a single-branch network that learns discriminative representations without separate networks for each modality, achieving superior performance over existing methods in experiments.

With the rapid growth of social media platforms, users are sharing billions of multimedia posts containing audio, images, and text. Researchers have focused on building autonomous systems capable of processing such multimedia data to solve challenging multimodal tasks including cross-modal retrieval, matching, and verification. Existing works use separate networks to extract embeddings of each modality to bridge the gap between them. The modular structure of their branched networks is fundamental in creating numerous multimodal applications and has become a defacto standard to handle multiple modalities. In contrast, we propose a novel single-branch network capable of learning discriminative representation of unimodal as well as multimodal tasks without changing the network. An important feature of our single-branch network is that it can be trained either using single or multiple modalities without sacrificing performance. We evaluated our proposed single-branch network on the challenging multimodal problem (face-voice association) for cross-modal verification and matching tasks with various loss formulations. Experimental results demonstrate the superiority of our proposed single-branch network over the existing methods in a wide range of experiments. Code: https://github.com/msaadsaeed/SBNet

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes