The Oracle and The Prism: A Decoupled and Efficient Framework for Generative Recommendation Explanation
This addresses the problem of inefficient and suboptimal explainable recommendations for users and developers, offering a novel decoupled approach that is incremental in optimizing existing methods.
The paper tackles the performance-efficiency trade-off in explainable recommendation systems using LLMs by proposing Prism, a decoupled framework that separates ranking and explanation generation, resulting in a 140M-parameter model outperforming its 11B-parameter teacher in faithfulness and personalization with a 24 times speedup and 10 times memory reduction.
The integration of Large Language Models (LLMs) into explainable recommendation systems often leads to a performance-efficiency trade-off in end-to-end architectures, where joint optimization of ranking and explanation can result in suboptimal compromises. To resolve this, we propose Prism, a novel decoupled framework that rigorously separates the recommendation process into a dedicated ranking stage and an explanation generation stage. Inspired by knowledge distillation, Prism leverages a powerful teacher LLM (e.g., FLAN-T5-XXL) as an Oracle to produce high-fidelity explanatory knowledge. A compact, fine-tuned student model (e.g., BART-Base), the Prism, then specializes in synthesizing this knowledge into personalized explanations. This decomposition ensures that each component is optimized for its specific objective, eliminating inherent conflicts in coupled models. Extensive experiments on benchmark datasets demonstrate that our 140M-parameter Prism model significantly outperforms its 11B-parameter teacher in human evaluations of faithfulness and personalization, while achieving a 24 times speedup and a 10 times reduction in memory consumption during inference. These results validate that decoupling, coupled with targeted distillation, provides an efficient and effective pathway to high-quality explainable recommendation.