An Efficient Modern Baseline for FloodNet VQA
This work addresses the need for efficient VQA systems in disaster response, though it is incremental as it builds on known methods.
The authors tackled the problem of designing efficient and reliable visual question answering (VQA) systems for disaster management by revisiting fundamental combination methods with modern feature models, resulting in a system that outperforms existing methods on the FloodNet dataset and achieves state-of-the-art performance with significantly reduced training and inference time.
Designing efficient and reliable VQA systems remains a challenging problem, more so in the case of disaster management and response systems. In this work, we revisit fundamental combination methods like concatenation, addition and element-wise multiplication with modern image and text feature abstraction models. We design a simple and efficient system which outperforms pre-existing methods on the FloodNet dataset and achieves state-of-the-art performance. This simplified system requires significantly less training and inference time than modern VQA architectures. We also study the performance of various backbones and report their consolidated results. Code is available at https://github.com/sahilkhose/floodnet_vqa.