Adaptive Low-Precision Training for Embeddings in Click-Through Rate Prediction
This work addresses the efficiency and economic challenges of training and deploying CTR models for large-scale recommendation systems, representing an incremental advancement in quantization techniques.
The paper tackles the problem of compressing large embedding tables in click-through rate prediction models by introducing adaptive low-precision training, which learns quantization step sizes via gradient descent to reduce accuracy loss. Experiments show that this method successfully trains 8-bit embeddings without sacrificing prediction accuracy, achieving significant improvements at extremely low bit widths.
Embedding tables are usually huge in click-through rate (CTR) prediction models. To train and deploy the CTR models efficiently and economically, it is necessary to compress their embedding tables at the training stage. To this end, we formulate a novel quantization training paradigm to compress the embeddings from the training stage, termed low-precision training (LPT). Also, we provide theoretical analysis on its convergence. The results show that stochastic weight quantization has a faster convergence rate and a smaller convergence error than deterministic weight quantization in LPT. Further, to reduce the accuracy degradation, we propose adaptive low-precision training (ALPT) that learns the step size (i.e., the quantization resolution) through gradient descent. Experiments on two real-world datasets confirm our analysis and show that ALPT can significantly improve the prediction accuracy, especially at extremely low bit widths. For the first time in CTR models, we successfully train 8-bit embeddings without sacrificing prediction accuracy. The code of ALPT is publicly available.