CLIROct 12, 2021

Attention-guided Generative Models for Extractive Question Answering

arXiv:2110.06393v117 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of efficient and hallucination-free extractive QA for NLP applications, though it is incremental as it builds on existing generative models.

The paper tackles the problem of applying generative models to extractive question answering by leveraging decoder cross-attention patterns to obtain answer spans, achieving state-of-the-art performance on datasets like NaturalQuestions and TriviaQA with fewer parameters.

We propose a novel method for applying Transformer models to extractive question answering (QA) tasks. Recently, pretrained generative sequence-to-sequence (seq2seq) models have achieved great success in question answering. Contributing to the success of these models are internal attention mechanisms such as cross-attention. We propose a simple strategy to obtain an extractive answer span from the generative model by leveraging the decoder cross-attention patterns. Viewing cross-attention as an architectural prior, we apply joint training to further improve QA performance. Empirical results show that on open-domain question answering datasets like NaturalQuestions and TriviaQA, our method approaches state-of-the-art performance on both generative and extractive inference, all while using much fewer parameters. Furthermore, this strategy allows us to perform hallucination-free inference while conferring significant improvements to the model's ability to rerank relevant passages.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes