Shumin Shi

h-index8
2papers

2 Papers

CLOct 6, 2022
U3E: Unsupervised and Erasure-based Evidence Extraction for Machine Reading Comprehension

Suzhe He, Shumin Shi, Chenghao Wu

More tasks in Machine Reading Comprehension(MRC) require, in addition to answer prediction, the extraction of evidence sentences that support the answer. However, the annotation of supporting evidence sentences is usually time-consuming and labor-intensive. In this paper, to address this issue and considering that most of the existing extraction methods are semi-supervised, we propose an unsupervised evidence extraction method (U3E). U3E takes the changes after sentence-level feature erasure in the document as input, simulating the decline in problem-solving ability caused by human memory decline. In order to make selections on the basis of fully understanding the semantics of the original text, we also propose metrics to quickly select the optimal memory model for this input changes. To compare U3E with typical evidence extraction methods and investigate its effectiveness in evidence extraction, we conduct experiments on different datasets. Experimental results show that U3E is simple but effective, not only extracting evidence more accurately, but also significantly improving model performance.

CLSep 1, 2025
Do Retrieval Augmented Language Models Know When They Don't Know?

Youchao Zhou, Heyan Huang, Yicheng Liu et al.

Existing large language models (LLMs) occasionally generate plausible yet factually incorrect responses, known as hallucinations. Two main approaches have been proposed to mitigate hallucinations: retrieval-augmented language models (RALMs) and refusal post-training. However, current research predominantly focuses on their individual effectiveness while overlooking the evaluation of the refusal capability of RALMs. Ideally, if RALMs know when they do not know, they should refuse to answer.In this study, we ask the fundamental question: Do RALMs know when they don't know? Specifically, we investigate three questions. First, are RALMs well calibrated with respect to different internal and external knowledge states? We examine the influence of various factors. Contrary to expectations, when all retrieved documents are irrelevant, RALMs still tend to refuse questions they could have answered correctly. Next, given the model's pronounced \textbf{over-refusal} behavior, we raise a second question: How does a RALM's refusal ability align with its calibration quality? Our results show that the over-refusal problem can be mitigated through in-context fine-tuning. However, we observe that improved refusal behavior does not necessarily imply better calibration or higher overall accuracy. Finally, we ask: Can we combine refusal-aware RALMs with uncertainty-based answer abstention to mitigate over-refusal? We develop a simple yet effective refusal mechanism for refusal-post-trained RALMs that improves their overall answer quality by balancing refusal and correct answers. Our study provides a more comprehensive understanding of the factors influencing RALM behavior. Meanwhile, we emphasize that uncertainty estimation for RALMs remains an open problem deserving deeper investigation.