IR AI LGAug 26, 2024

Bridging the Gap: Unpacking the Hidden Challenges in Knowledge Distillation for Online Ranking Systems

Nikhil Khani, Shuo Yang, Aniruddh Nath, Yang Liu, Pendo Abbo, Li Wei, Shawn Andrews, Maciej Kula, Jarrod Kahn, Zhe Zhao, Lichan Hong, Ed Chi

arXiv:2408.14678v14.0

Originality Incremental advance

AI Analysis

It addresses practical problems for deploying efficient models in latency-sensitive recommender systems, but is incremental as it adapts existing methods to a specific domain.

This paper tackles overlooked challenges in applying Knowledge Distillation to online ranking systems, such as data distribution shifts and teacher configuration optimization, and demonstrates significant improvements in student model performance through live experiments on large-scale personalized video recommendation systems at Google.

Knowledge Distillation (KD) is a powerful approach for compressing a large model into a smaller, more efficient model, particularly beneficial for latency-sensitive applications like recommender systems. However, current KD research predominantly focuses on Computer Vision (CV) and NLP tasks, overlooking unique data characteristics and challenges inherent to recommender systems. This paper addresses these overlooked challenges, specifically: (1) mitigating data distribution shifts between teacher and student models, (2) efficiently identifying optimal teacher configurations within time and budgetary constraints, and (3) enabling computationally efficient and rapid sharing of teacher labels to support multiple students. We present a robust KD system developed and rigorously evaluated on multiple large-scale personalized video recommendation systems within Google. Our live experiment results demonstrate significant improvements in student model performance while ensuring consistent and reliable generation of high quality teacher labels from a continuous data stream of data.

View on arXiv PDF

Similar