Towards Interpretable and Reliable Reading Comprehension: A Pipeline Model with Unanswerability Prediction
This addresses the need for more trustworthy AI in NLP by improving interpretability and reliability in multi-hop QA, though it is incremental as it builds on existing pipeline and unanswerability prediction methods.
The paper tackled the problem of making reading comprehension models interpretable and reliable by predicting unanswerable queries, showing that their pipeline model outperformed non-interpretable models on a modified HotpotQA dataset with comparable results despite trade-offs.
Multi-hop QA with annotated supporting facts, which is the task of reading comprehension (RC) considering the interpretability of the answer, has been extensively studied. In this study, we define an interpretable reading comprehension (IRC) model as a pipeline model with the capability of predicting unanswerable queries. The IRC model justifies the answer prediction by establishing consistency between the predicted supporting facts and the actual rationale for interpretability. The IRC model detects unanswerable questions, instead of outputting the answer forcibly based on the insufficient information, to ensure the reliability of the answer. We also propose an end-to-end training method for the pipeline RC model. To evaluate the interpretability and the reliability, we conducted the experiments considering unanswerability in a multi-hop question for a given passage. We show that our end-to-end trainable pipeline model outperformed a non-interpretable model on our modified HotpotQA dataset. Experimental results also show that the IRC model achieves comparable results to the previous non-interpretable models in spite of the trade-off between prediction performance and interpretability.