CVAug 22, 2022

Neuro-Symbolic Visual Dialog

arXiv:2208.10353v1580 citationsh-index: 63
Originality Highly original
AI Analysis

This addresses the challenge of long-distance co-reference resolution and vanishing question-answering performance in visual dialog, offering a more robust and generalizable solution for AI systems handling complex visual interactions.

They tackled the problem of multi-round visually-grounded reasoning in visual dialog by combining deep learning and symbolic program execution, achieving 99.72% accuracy on CLEVR-Dialog with a relative improvement of over 10% over state-of-the-art methods while using less training data.

We propose Neuro-Symbolic Visual Dialog (NSVD) -the first method to combine deep learning and symbolic program execution for multi-round visually-grounded reasoning. NSVD significantly outperforms existing purely-connectionist methods on two key challenges inherent to visual dialog: long-distance co-reference resolution as well as vanishing question-answering performance. We demonstrate the latter by proposing a more realistic and stricter evaluation scheme in which we use predicted answers for the full dialog history when calculating accuracy. We describe two variants of our model and show that using this new scheme, our best model achieves an accuracy of 99.72% on CLEVR-Dialog -a relative improvement of more than 10% over the state of the art while only requiring a fraction of training data. Moreover, we demonstrate that our neuro-symbolic models have a higher mean first failure round, are more robust against incomplete dialog histories, and generalise better not only to dialogs that are up to three times longer than those seen during training but also to unseen question types and scenes.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes