ML LGFeb 17, 2022

An alternative approach to train neural networks using monotone variational inequality

arXiv:2202.08876v45.32 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the problem of efficient and guaranteed training for neural networks, particularly for fine-tuning pre-trained models like large language models, but it is incremental as it builds on existing monotone variational inequality methods.

The paper tackles neural network training by proposing an alternative approach using monotone variational inequality, inspired by prior work on generalized linear models, to reduce non-convex problems to convex ones; it demonstrates competitive or better performance compared to stochastic gradient descent on tasks like fully-connected, graph, and convolutional neural networks, with fast convergence and guarantees in special cases such as single-layer networks or fine-tuning pre-trained models.

We propose an alternative approach to neural network training using the monotone vector field, an idea inspired by the seminal work of Juditsky and Nemirovski [Juditsky & Nemirovsky, 2019] developed originally to solve parameter estimation problems for generalized linear models (GLM) by reducing the original non-convex problem to a convex problem of solving a monotone variational inequality (VI). Our approach leads to computationally efficient procedures that converge fast and offer guarantee in some special cases, such as training a single-layer neural network or fine-tuning the last layer of the pre-trained model. Our approach can be used for more efficient fine-tuning of a pre-trained model while freezing the bottom layers, an essential step for deploying many machine learning models such as large language models (LLM). We demonstrate its applicability in training fully-connected (FC) neural networks, graph neural networks (GNN), and convolutional neural networks (CNN) and show the competitive or better performance of our approach compared to stochastic gradient descent methods on both synthetic and real network data prediction tasks regarding various performance metrics.

View on arXiv PDF Code

Similar