Always Strengthen Your Strengths: A Drift-Aware Incremental Learning Framework for CTR Prediction
This addresses the problem of distribution drift in streaming data for industrial CTR prediction, offering an incremental improvement over existing methods.
The paper tackles catastrophic forgetting in click-through rate (CTR) prediction for recommendation systems by proposing a drift-aware incremental learning framework based on ensemble learning, which outperforms baselines in offline experiments and A/B tests.
Click-through rate (CTR) prediction is of great importance in recommendation systems and online advertising platforms. When served in industrial scenarios, the user-generated data observed by the CTR model typically arrives as a stream. Streaming data has the characteristic that the underlying distribution drifts over time and may recur. This can lead to catastrophic forgetting if the model simply adapts to new data distribution all the time. Also, it's inefficient to relearn distribution that has been occurred. Due to memory constraints and diversity of data distributions in large-scale industrial applications, conventional strategies for catastrophic forgetting such as replay, parameter isolation, and knowledge distillation are difficult to be deployed. In this work, we design a novel drift-aware incremental learning framework based on ensemble learning to address catastrophic forgetting in CTR prediction. With explicit error-based drift detection on streaming data, the framework further strengthens well-adapted ensembles and freezes ensembles that do not match the input distribution avoiding catastrophic interference. Both evaluations on offline experiments and A/B test shows that our method outperforms all baselines considered.