CVAIJun 8, 2018

CS-VQA: Visual Question Answering with Compressively Sensed Images

arXiv:1806.03379v19 citations
Originality Incremental advance
AI Analysis

This enables VQA applications in resource-constrained environments, such as mobile or embedded systems, by reducing data acquisition requirements.

The paper tackles the problem of performing Visual Question Answering (VQA) using compressively sensed images, which are captured at sub-Nyquist rates, and shows that VQA is solvable in this compressed domain with minimal performance degradation, especially when combined with deep neural networks for reconstruction.

Visual Question Answering (VQA) is a complex semantic task requiring both natural language processing and visual recognition. In this paper, we explore whether VQA is solvable when images are captured in a sub-Nyquist compressive paradigm. We develop a series of deep-network architectures that exploit available compressive data to increasing degrees of accuracy, and show that VQA is indeed solvable in the compressed domain. Our results show that there is nominal degradation in VQA performance when using compressive measurements, but that accuracy can be recovered when VQA pipelines are used in conjunction with state-of-the-art deep neural networks for CS reconstruction. The results presented yield important implications for resource-constrained VQA applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes