CL AIMar 11, 2025

Cross-Examiner: Evaluating Consistency of Large Language Model-Generated Explanations

Danielle Villa, Maria Chang, Keerthiram Murugesan, Rosario Uceda-Sosa, Karthikeyan Natesan Ramamurthy

arXiv:2503.08815v12.7h-index: 33

Originality Incremental advance

AI Analysis

This work addresses the issue of inaccurate or misleading explanations from LLMs, which is critical for enhancing transparency and trust in AI systems, though it is incremental in nature.

The paper tackles the problem of evaluating the consistency of explanations generated by large language models by introducing a method that combines symbolic information extraction with language model-driven question generation to produce better follow-up questions than LLMs alone, resulting in improved flexibility and variety in question generation.

Large Language Models (LLMs) are often asked to explain their outputs to enhance accuracy and transparency. However, evidence suggests that these explanations can misrepresent the models' true reasoning processes. One effective way to identify inaccuracies or omissions in these explanations is through consistency checking, which typically involves asking follow-up questions. This paper introduces, cross-examiner, a new method for generating follow-up questions based on a model's explanation of an initial question. Our method combines symbolic information extraction with language model-driven question generation, resulting in better follow-up questions than those produced by LLMs alone. Additionally, this approach is more flexible than other methods and can generate a wider variety of follow-up questions.

View on arXiv PDF

Similar