Two Birds with One Stone: Improving Factuality and Faithfulness of LLMs via Dynamic Interactive Subspace Editing
This addresses a critical limitation for LLM deployment by concurrently mitigating two key hallucination types that existing methods handle separately with trade-offs.
The paper tackles the problem of factuality and faithfulness hallucinations in LLMs by revealing that these hallucination types share overlapping subspaces in neural representations, and proposes SPACE, a unified framework that jointly enhances both aspects through dynamic interactive subspace editing, achieving superior results across multiple benchmark datasets.
LLMs have demonstrated unprecedented capabilities in natural language processing, yet their practical deployment remains hindered by persistent factuality and faithfulness hallucinations. While existing methods address these hallucination types independently, they inadvertently induce performance trade-offs, as interventions targeting one type often exacerbate the other. Through empirical and theoretical analysis of activation space dynamics in LLMs, we reveal that these hallucination categories share overlapping subspaces within neural representations, presenting an opportunity for concurrent mitigation. To harness this insight, we propose SPACE, a unified framework that jointly enhances factuality and faithfulness by editing shared activation subspaces. SPACE establishes a geometric foundation for shared subspace existence through dual-task feature modeling, then identifies and edits these subspaces via a hybrid probe strategy combining spectral clustering and attention head saliency scoring. Experimental results across multiple benchmark datasets demonstrate the superiority of our approach.