CLAug 23, 2025

Unbiased Reasoning for Knowledge-Intensive Tasks in Large Language Models via Conditional Front-Door Adjustment

Bo Zhao, Yinghao Zhang, Ziqi Xu, Yongli Ren, Xiuzhen Zhang, Renqiang Luo, Zaiwen Feng, Feng Xia

arXiv:2508.16910v18 citationsh-index: 4CIKM

Originality Incremental advance

AI Analysis

This addresses the issue of biased reasoning in LLMs for knowledge-intensive tasks, offering a novel causal approach that is more robust and generalizable, though it appears incremental as it builds on existing causal methods like front-door adjustment.

The paper tackles the problem of internal bias in large language models (LLMs) during knowledge-intensive reasoning tasks by proposing a causal prompting framework called Conditional Front-Door Prompting (CFD-Prompting), which uses counterfactual external knowledge to enable unbiased estimation and significantly outperforms existing baselines in accuracy and robustness across multiple LLMs and datasets.

Large Language Models (LLMs) have shown impressive capabilities in natural language processing but still struggle to perform well on knowledge-intensive tasks that require deep reasoning and the integration of external knowledge. Although methods such as Retrieval-Augmented Generation (RAG) and Chain-of-Thought (CoT) have been proposed to enhance LLMs with external knowledge, they still suffer from internal bias in LLMs, which often leads to incorrect answers. In this paper, we propose a novel causal prompting framework, Conditional Front-Door Prompting (CFD-Prompting), which enables the unbiased estimation of the causal effect between the query and the answer, conditional on external knowledge, while mitigating internal bias. By constructing counterfactual external knowledge, our framework simulates how the query behaves under varying contexts, addressing the challenge that the query is fixed and is not amenable to direct causal intervention. Compared to the standard front-door adjustment, the conditional variant operates under weaker assumptions, enhancing both robustness and generalisability of the reasoning process. Extensive experiments across multiple LLMs and benchmark datasets demonstrate that CFD-Prompting significantly outperforms existing baselines in both accuracy and robustness.

View on arXiv PDF

Similar