Learning Answer Generation using Supervision from Automatic Question Answering Evaluators
This work addresses the challenge of enhancing answer generation in QA systems, particularly for applications requiring accurate and reliable responses, though it is incremental in leveraging existing evaluation models for training.
The paper tackles the problem of improving Generation-based QA (GenQA) models by proposing a novel training paradigm that uses supervision from automatic QA evaluation models (GAVA), resulting in significant improvements in answering accuracy on academic and industrial datasets.
Recent studies show that sentence-level extractive QA, i.e., based on Answer Sentence Selection (AS2), is outperformed by Generation-based QA (GenQA) models, which generate answers using the top-k answer sentences ranked by AS2 models (a la retrieval-augmented generation style). In this paper, we propose a novel training paradigm for GenQA using supervision from automatic QA evaluation models (GAVA). Specifically, we propose three strategies to transfer knowledge from these QA evaluation models to a GenQA model: (i) augmenting training data with answers generated by the GenQA model and labelled by GAVA (either statically, before training, or (ii) dynamically, at every training epoch); and (iii) using the GAVA score for weighting the generator loss during the learning of the GenQA model. We evaluate our proposed methods on two academic and one industrial dataset, obtaining a significant improvement in answering accuracy over the previous state of the art.