CLAILGOct 3, 2023

Ask Again, Then Fail: Large Language Models' Vacillations in Judgment

arXiv:2310.02174v537 citationsh-index: 12
Originality Incremental advance
AI Analysis

This addresses a critical issue for users relying on AI assistants for consistent and trustworthy responses, though it is an incremental improvement on existing methods.

The paper tackles the problem of conversational language models wavering in their judgments when asked follow-up questions, which undermines reliability and trust. They introduce a training framework called Unwavering-FQ that teaches models to maintain correct judgments, showing effectiveness in experiments.

We observe that current conversational language models often waver in their judgments when faced with follow-up questions, even if the original judgment was correct. This wavering presents a significant challenge for generating reliable responses and building user trust. To comprehensively assess this issue, we introduce a \textsc{Follow-up Questioning Mechanism} along with two metrics to quantify this inconsistency, confirming its widespread presence in current language models. To mitigate this issue, we explore various prompting strategies for closed-source models; moreover, we develop a training-based framework \textsc{Unwavering-FQ} that teaches language models to maintain their originally correct judgments through synthesized high-quality preference data. Our experimental results confirm the effectiveness of our framework and its ability to enhance the general capabilities of models.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes