Generative Context Pair Selection for Multi-hop Question Answering
This addresses robustness issues in multi-hop QA for AI systems, though it is incremental as it builds on existing generative approaches.
The paper tackles biases in multi-hop question answering models by proposing a generative context selection model that reasons about question generation from context pairs. It achieves comparable state-of-the-art answering performance and shows a 4.9% higher robustness on adversarial held-out sets.
Compositional reasoning tasks like multi-hop question answering, require making latent decisions to get the final answer, given a question. However, crowdsourced datasets often capture only a slice of the underlying task distribution, which can induce unanticipated biases in models performing compositional reasoning. Furthermore, discriminatively trained models exploit such biases to get a better held-out performance, without learning the right way to reason, as they do not necessitate paying attention to the question representation (conditioning variable) in its entirety, to estimate the answer likelihood. In this work, we propose a generative context selection model for multi-hop question answering that reasons about how the given question could have been generated given a context pair. While being comparable to the state-of-the-art answering performance, our proposed generative passage selection model has a better performance (4.9% higher than baseline) on adversarial held-out set which tests robustness of model's multi-hop reasoning capabilities.