CL IROct 11, 2024

Retrieving Contextual Information for Long-Form Question Answering using Weak Supervision

Philipp Christmann, Svitlana Vakulenko, Ionut Teodor Sorodoc, Bill Byrne, Adrià de Gispert

arXiv:2410.08623v112.923 citationsh-index: 8EMNLP

Originality Incremental advance

AI Analysis

This addresses the lack of training data and suboptimal retrieval for contextual information in long-form QA, though it is incremental as it builds on existing methods.

The paper tackles the problem of retrieving contextual information for long-form question answering by proposing weak supervision techniques, resulting in a 14.7% improvement in relevant page recall and a 12.5% increase in answer groundedness.

Long-form question answering (LFQA) aims at generating in-depth answers to end-user questions, providing relevant information beyond the direct answer. However, existing retrievers are typically optimized towards information that directly targets the question, missing out on such contextual information. Furthermore, there is a lack of training data for relevant context. To this end, we propose and compare different weak supervision techniques to optimize retrieval for contextual information. Experiments demonstrate improvements on the end-to-end QA performance on ASQA, a dataset for long-form question answering. Importantly, as more contextual information is retrieved, we improve the relevant page recall for LFQA by 14.7% and the groundedness of generated long-form answers by 12.5%. Finally, we show that long-form answers often anticipate likely follow-up questions, via experiments on a conversational QA dataset.

View on arXiv PDF

Similar