CVAICLLGApr 21, 2024

Exploring Diverse Methods in Visual Question Answering

arXiv:2404.13565v366 citationsh-index: 112024 5th International Conference on Electronic Communication and Artificial Intelligence (ICECAI)
Originality Synthesis-oriented
AI Analysis

This work addresses VQA challenges for AI researchers, but it is incremental as it investigates existing methods without introducing a new paradigm.

This study tackled the problem of improving Visual Question Answering (VQA) by exploring GANs, autoencoders, and attention mechanisms, finding that autoencoders achieved comparable results to GANs with better performance on complex questions, while attention mechanisms faced a complexity-performance trade-off.

This study explores innovative methods for improving Visual Question Answering (VQA) using Generative Adversarial Networks (GANs), autoencoders, and attention mechanisms. Leveraging a balanced VQA dataset, we investigate three distinct strategies. Firstly, GAN-based approaches aim to generate answer embeddings conditioned on image and question inputs, showing potential but struggling with more complex tasks. Secondly, autoencoder-based techniques focus on learning optimal embeddings for questions and images, achieving comparable results with GAN due to better ability on complex questions. Lastly, attention mechanisms, incorporating Multimodal Compact Bilinear pooling (MCB), address language priors and attention modeling, albeit with a complexity-performance trade-off. This study underscores the challenges and opportunities in VQA and suggests avenues for future research, including alternative GAN formulations and attentional mechanisms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes