Diagnosing Knowledge Conflict in Multimodal Long-Chain Reasoning
This work addresses a critical problem in AI for improving the reliability of multimodal reasoning systems, though it is incremental as it builds on existing knowledge conflict concepts.
The paper tackled failures in multimodal large language models during long chain-of-thought reasoning caused by conflicting knowledge sources, revealing through internal representation analysis that conflict types are linearly separable, localized in specific layers, and asymmetrically manipulable, enabling principled diagnosis and control of these failures.
Multimodal large language models (MLLMs) in long chain-of-thought reasoning often fail when different knowledge sources provide conflicting signals. We formalize these failures under a unified notion of knowledge conflict, distinguishing input-level objective conflict from process-level effective conflict. Through probing internal representations, we reveal that: (I) Linear Separability: different conflict types are explicitly encoded as linearly separable features rather than entangled; (II) Depth Localization: conflict signals concentrate in mid-to-late layers, indicating a distinct processing stage for conflict encoding; (III) Hierarchical Consistency: aggregating noisy token-level signals along trajectories robustly recovers input-level conflict types; and (IV) Directional Asymmetry: reinforcing the model's implicit source preference under conflict is far easier than enforcing the opposite source. Our findings provide a mechanism-level view of multimodal reasoning under knowledge conflict and enable principled diagnosis and control of long-CoT failures.