Large-Scale Answerer in Questioner's Mind for Visual Dialog Question Generation
This work addresses a scalability problem in task-oriented visual dialog systems, offering a significant performance improvement for applications like visual question answering, though it is incremental as it builds upon the existing AQM framework.
The authors tackled the limitation of the Answerer in Questioner's Mind (AQM) framework in handling large-scale solution spaces by proposing AQM+, which improves question coherence in visual dialog. The result shows that AQM+ outperforms state-of-the-art models, reducing error by over 60% as the dialog proceeds compared to less than 6% for comparative algorithms on the GuessWhich task with nearly 10K candidate classes.
Answerer in Questioner's Mind (AQM) is an information-theoretic framework that has been recently proposed for task-oriented dialog systems. AQM benefits from asking a question that would maximize the information gain when it is asked. However, due to its intrinsic nature of explicitly calculating the information gain, AQM has a limitation when the solution space is very large. To address this, we propose AQM+ that can deal with a large-scale problem and ask a question that is more coherent to the current context of the dialog. We evaluate our method on GuessWhich, a challenging task-oriented visual dialog problem, where the number of candidate classes is near 10K. Our experimental results and ablation studies show that AQM+ outperforms the state-of-the-art models by a remarkable margin with a reasonable approximation. In particular, the proposed AQM+ reduces more than 60% of error as the dialog proceeds, while the comparative algorithms diminish the error by less than 6%. Based on our results, we argue that AQM+ is a general task-oriented dialog algorithm that can be applied for non-yes-or-no responses.