LGJul 30, 2021

Adaptive Optimizers with Sparse Group Lasso for Neural Networks in CTR Prediction

arXiv:2107.14432v6Has Code
AI Analysis

This work addresses computational efficiency and model compression for CTR prediction systems, representing an incremental improvement through regularization integration.

The authors tackled the problem of improving neural network efficiency for click-through rate prediction by developing a new class of adaptive optimizers that incorporate sparse group lasso regularization, achieving significantly better performance at the same sparsity level and extremely high sparsity with competitive performance compared to baseline methods.

We develop a novel framework that adds the regularizers of the sparse group lasso to a family of adaptive optimizers in deep learning, such as Momentum, Adagrad, Adam, AMSGrad, AdaHessian, and create a new class of optimizers, which are named Group Momentum, Group Adagrad, Group Adam, Group AMSGrad and Group AdaHessian, etc., accordingly. We establish theoretically proven convergence guarantees in the stochastic convex settings, based on primal-dual methods. We evaluate the regularized effect of our new optimizers on three large-scale real-world ad click datasets with state-of-the-art deep learning models. The experimental results reveal that compared with the original optimizers with the post-processing procedure which uses the magnitude pruning method, the performance of the models can be significantly improved on the same sparsity level. Furthermore, in comparison to the cases without magnitude pruning, our methods can achieve extremely high sparsity with significantly better or highly competitive performance. The code is available at https://github.com/intelligent-machine-learning/tfplus/tree/main/tfplus.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes