LGCLCVMLSep 19, 2019

Learning Sparse Mixture of Experts for Visual Question Answering

arXiv:1909.09192v13 citations
Originality Synthesis-oriented
AI Analysis

This work addresses deployment challenges for VQA systems, but it is incremental as it focuses on efficiency improvements within existing frameworks.

The paper tackles the computational intensity of Visual Question Answering models by proposing a sparse mixture of experts architecture for the CNN module, achieving comparable performance to standard models.

There has been a rapid progress in the task of Visual Question Answering with improved model architectures. Unfortunately, these models are usually computationally intensive due to their sheer size which poses a serious challenge for deployment. We aim to tackle this issue for the specific task of Visual Question Answering (VQA). A Convolutional Neural Network (CNN) is an integral part of the visual processing pipeline of a VQA model (assuming the CNN is trained along with entire VQA model). In this project, we propose an efficient and modular neural architecture for the VQA task with focus on the CNN module. Our experiments demonstrate that a sparsely activated CNN based VQA model achieves comparable performance to a standard CNN based VQA model architecture.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes