Calibrating Trust of Multi-Hop Question Answering Systems with Decompositional Probes
This addresses the problem of explainability in NLP for users, though it is incremental as it builds on existing decomposition methods.
The paper tackled the challenge of improving user trust in multi-hop question answering systems by using decompositional probes as explanations, and found that exposing these probes increased users' ability to predict system performance.
Multi-hop Question Answering (QA) is a challenging task since it requires an accurate aggregation of information from multiple context paragraphs and a thorough understanding of the underlying reasoning chains. Recent work in multi-hop QA has shown that performance can be boosted by first decomposing the questions into simpler, single-hop questions. In this paper, we explore one additional utility of the multi-hop decomposition from the perspective of explainable NLP: to create explanation by probing a neural QA model with them. We hypothesize that in doing so, users will be better able to predict when the underlying QA system will give the correct answer. Through human participant studies, we verify that exposing the decomposition probes and answers to the probes to users can increase their ability to predict system performance on a question instance basis. We show that decomposition is an effective form of probing QA systems as well as a promising approach to explanation generation. In-depth analyses show the need for improvements in decomposition systems.