Why Thinking Hurts: Diagnosing and Rectifying Linguistic Inertia in Large Language Models for Recommendation
For practitioners using LLMs for recommendation, this work diagnoses and fixes a critical failure mode of CoT reasoning, enabling reliable use of thinking in recommender systems.
The paper identifies 'Linguistic Inertia' in LLM-based recommenders, where Chain-of-Thought reasoning degrades recommendation quality by up to 25%, and proposes LICD, a training-free framework that recovers performance, outperforming baselines on three benchmarks.
Chain-of-Thought (CoT) reasoning is widely used to improve LLM performance, and recent foundation recommender models adopt it by generating textual reasoning before predicting target items represented by Semantic IDs (SIDs). However, we observe that enabling thinking mode in models such as OpenOneRec can degrade recommendation quality by up to 25%. We investigate this failure and identify Linguistic Inertia: when a textual CoT segment is inserted before SID generation, the model relies more on natural-language context and less on historical SID evidence. Further analyses show that this effect is amplified by reduced access to historical information and longer CoT lengths. To mitigate it, we propose Linguistic-Inertia-Calibrated Decoding (LICD), a training-free framework that combines Reasoning-Chain Compression and Bias-Subtracted Contrastive Inference. Experiments on three large-scale benchmarks show that LICD consistently outperforms both no-thinking and original-thinking baselines. Our code is available at https://anonymous.4open.science/r/LICD-4573.