Collaborative Distillation for Top-N Recommendation
This work provides a method to compress recommender models for faster inference, which is incremental as it adapts existing distillation techniques to a specific domain.
The paper tackles the challenge of applying knowledge distillation to recommender systems by proposing collaborative distillation, which addresses sparsity and ranking issues, resulting in performance improvements of 2.7-33.2% in hit rate and 2.7-29.1% in NDCG over state-of-the-art methods.
Knowledge distillation (KD) is a well-known method to reduce inference latency by compressing a cumbersome teacher model to a small student model. Despite the success of KD in the classification task, applying KD to recommender models is challenging due to the sparsity of positive feedback, the ambiguity of missing feedback, and the ranking problem associated with the top-N recommendation. To address the issues, we propose a new KD model for the collaborative filtering approach, namely collaborative distillation (CD). Specifically, (1) we reformulate a loss function to deal with the ambiguity of missing feedback. (2) We exploit probabilistic rank-aware sampling for the top-N recommendation. (3) To train the proposed model effectively, we develop two training strategies for the student model, called the teacher- and the student-guided training methods, selecting the most useful feedback from the teacher model. Via experimental results, we demonstrate that the proposed model outperforms the state-of-the-art method by 2.7-33.2% and 2.7-29.1% in hit rate (HR) and normalized discounted cumulative gain (NDCG), respectively. Moreover, the proposed model achieves the performance comparable to the teacher model.