CVSep 26, 2025

A Comprehensive Evaluation of Transformer-Based Question Answering Models and RAG-Enhanced Design

arXiv:2509.21845v1h-index: 11
Originality Incremental advance
AI Analysis

This work addresses multi-hop question answering, a challenging task for NLP applications, but it is incremental as it builds on existing retrieval-augmented generation methods with hybrid techniques.

This paper tackled the problem of multi-hop reasoning in question answering by evaluating retrieval strategies within a retrieval-augmented generation framework, showing that a hybrid method achieved relative improvements of 50% in exact match and 47% in F1 score compared to baseline cosine similarity on the HotpotQA dataset.

Transformer-based models have advanced the field of question answering, but multi-hop reasoning, where answers require combining evidence across multiple passages, remains difficult. This paper presents a comprehensive evaluation of retrieval strategies for multi-hop question answering within a retrieval-augmented generation framework. We compare cosine similarity, maximal marginal relevance, and a hybrid method that integrates dense embeddings with lexical overlap and re-ranking. To further improve retrieval, we adapt the EfficientRAG pipeline for query optimization, introducing token labeling and iterative refinement while maintaining efficiency. Experiments on the HotpotQA dataset show that the hybrid approach substantially outperforms baseline methods, achieving a relative improvement of 50 percent in exact match and 47 percent in F1 score compared to cosine similarity. Error analysis reveals that hybrid retrieval improves entity recall and evidence complementarity, while remaining limited in handling distractors and temporal reasoning. Overall, the results suggest that hybrid retrieval-augmented generation provides a practical zero-shot solution for multi-hop question answering, balancing accuracy, efficiency, and interpretability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes