CLMar 14, 2025

Neutralizing Bias in LLM Reasoning using Entailment Graphs

arXiv:2503.11614v13 citationsh-index: 62Has CodeACL
Originality Incremental advance
AI Analysis

This addresses a specific bias issue in LLM reasoning for NLP researchers, though it is incremental as it builds on known problems and methods.

The paper tackles the problem of LLMs suffering from hallucinations in Natural Language Inference due to attestation bias, and shows that their unsupervised framework for constructing counterfactual reasoning data and fine-tuning LLMs significantly reduces these hallucinations and improves inferential performance on both original and bias-neutralized NLI datasets.

LLMs are often claimed to be capable of Natural Language Inference (NLI), which is widely regarded as a cornerstone of more complex forms of reasoning. However, recent works show that LLMs still suffer from hallucinations in NLI due to attestation bias, where LLMs overly rely on propositional memory to build shortcuts. To solve the issue, we design an unsupervised framework to construct counterfactual reasoning data and fine-tune LLMs to reduce attestation bias. To measure bias reduction, we build bias-adversarial variants of NLI datasets with randomly replaced predicates in premises while keeping hypotheses unchanged. Extensive evaluations show that our framework can significantly reduce hallucinations from attestation bias. Then, we further evaluate LLMs fine-tuned with our framework on original NLI datasets and their bias-neutralized versions, where original entities are replaced with randomly sampled ones. Extensive results show that our framework consistently improves inferential performance on both original and bias-neutralized NLI datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes