Examining Cooperation in Visual Dialog Models
This work addresses the need for sanity checks in visual dialog systems, but it is incremental as it applies an existing intervention approach to a specific domain.
The authors tackled the problem of assessing individual component contributions in visual dialog models by proposing a blackbox intervention method that impairs components and measures performance changes. They found that both dialog and image information have minimal contributions to task performance.
In this work we propose a blackbox intervention method for visual dialog models, with the aim of assessing the contribution of individual linguistic or visual components. Concretely, we conduct structured or randomized interventions that aim to impair an individual component of the model, and observe changes in task performance. We reproduce a state-of-the-art visual dialog model and demonstrate that our methodology yields surprising insights, namely that both dialog and image information have minimal contributions to task performance. The intervention method presented here can be applied as a sanity check for the strength and robustness of each component in visual dialog systems.