CL AIDec 12, 2025

CIP: A Plug-and-Play Causal Prompting Framework for Mitigating Hallucinations under Long-Context Noise

Qingsen Ma, Dianyun Wang, Ran Jing, Yujun Sun, Zhenbo Xu

arXiv:2512.11282v1h-index: 1

Originality Incremental advance

AI Analysis

This addresses the critical issue of hallucinations in LLMs for users relying on accurate information retrieval, though it appears incremental as a prompting enhancement to existing models.

The paper tackles the problem of hallucinations in large language models when processing long, noisy retrieval contexts by proposing CIP, a plug-and-play causal prompting framework that improves factual grounding through causal reasoning. Experiments across seven models including GPT-4o show improvements of 2.6 points in Attributable Rate, 0.38 in Causal Consistency Score, and up to 55.1% reduction in response latency.

Large language models often hallucinate when processing long and noisy retrieval contexts because they rely on spurious correlations rather than genuine causal relationships. We propose CIP, a lightweight and plug-and-play causal prompting framework that mitigates hallucinations at the input stage. CIP constructs a causal relation sequence among entities, actions, and events and injects it into the prompt to guide reasoning toward causally relevant evidence. Through causal intervention and counterfactual reasoning, CIP suppresses non causal reasoning paths, improving factual grounding and interpretability. Experiments across seven mainstream language models, including GPT-4o, Gemini 2.0 Flash, and Llama 3.1, show that CIP consistently enhances reasoning quality and reliability, achieving 2.6 points improvement in Attributable Rate, 0.38 improvement in Causal Consistency Score, and a fourfold increase in effective information density. API level profiling further shows that CIP accelerates contextual understanding and reduces end to end response latency by up to 55.1 percent. These results suggest that causal reasoning may serve as a promising paradigm for improving the explainability, stability, and efficiency of large language models.

View on arXiv PDF

Similar