LG AI DC OCNov 27, 2023

MAST: Model-Agnostic Sparsified Training

Yury Demidovich, Grigory Malinovsky, Egor Shulgin, Peter Richtárik

arXiv:2311.16086v26.64 citationsh-index: 18Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of computational efficiency in machine learning training, offering a model-agnostic framework that is incremental but bridges theory and practice for techniques like Dropout and sparse training.

The paper tackles the problem of inefficient model training by proposing a sparsified optimization approach that incorporates pre-trained models and random sketches, achieving tighter convergence rates and relaxing assumptions compared to traditional methods.

We introduce a novel optimization problem formulation that departs from the conventional way of minimizing machine learning model loss as a black-box function. Unlike traditional formulations, the proposed approach explicitly incorporates an initially pre-trained model and random sketch operators, allowing for sparsification of both the model and gradient during training. We establish the insightful properties of the proposed objective function and highlight its connections to the standard formulation. Furthermore, we present several variants of the Stochastic Gradient Descent (SGD) method adapted to the new problem formulation, including SGD with general sampling, a distributed version, and SGD with variance reduction techniques. We achieve tighter convergence rates and relax assumptions, bridging the gap between theoretical principles and practical applications, covering several important techniques such as Dropout and Sparse training. This work presents promising opportunities to enhance the theoretical understanding of model training through a sparsification-aware optimization approach.

View on arXiv PDF Code

Similar