CLAIOct 15, 2024

Concept-Reversed Winograd Schema Challenge: Evaluating and Improving Robust Reasoning in Large Language Models via Abstraction

Tencent
arXiv:2410.12040v112 citationsh-index: 62NAACL
Originality Incremental advance
AI Analysis

This addresses reasoning robustness issues in LLMs, which is an incremental improvement for AI evaluation and reliability.

The authors tackled the problem of unreliable reasoning in Large Language Models by creating the Concept-Reversed Winograd Schema Challenge (CR-WSC) dataset, which caused a significant performance drop in LLMs, and proposed Abstraction-of-Thought (AoT) to improve robustness.

While Large Language Models (LLMs) have showcased remarkable proficiency in reasoning, there is still a concern about hallucinations and unreliable reasoning issues due to semantic associations and superficial logical chains. To evaluate the extent to which LLMs perform robust reasoning instead of relying on superficial logical chains, we propose a new evaluation dataset, the Concept-Reversed Winograd Schema Challenge (CR-WSC), based on the famous Winograd Schema Challenge (WSC) dataset. By simply reversing the concepts to those that are more associated with the wrong answer, we find that the performance of LLMs drops significantly despite the rationale of reasoning remaining the same. Furthermore, we propose Abstraction-of-Thought (AoT), a novel prompt method for recovering adversarial cases to normal cases using conceptual abstraction to improve LLMs' robustness and consistency in reasoning, as demonstrated by experiments on CR-WSC.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes