CVCLDec 12, 2016

VIBIKNet: Visual Bidirectional Kernelized Network for Visual Question Answering

arXiv:1612.03628v16 citations
Originality Synthesis-oriented
AI Analysis

This addresses visual question answering for AI applications, but it appears incremental as it builds on existing methods like CNNs and LSTMs.

The paper tackles visual question answering by proposing VIBIKNet, a model integrating Kernelized CNNs and LSTMs, achieving an optimal trade-off between accuracy and computational load in terms of memory and time consumption.

In this paper, we address the problem of visual question answering by proposing a novel model, called VIBIKNet. Our model is based on integrating Kernelized Convolutional Neural Networks and Long-Short Term Memory units to generate an answer given a question about an image. We prove that VIBIKNet is an optimal trade-off between accuracy and computational load, in terms of memory and time consumption. We validate our method on the VQA challenge dataset and compare it to the top performing methods in order to illustrate its performance and speed.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes