LG NA MLNov 11, 2024

Generative Feature Training of Thin 2-Layer Networks

arXiv:2411.06848v26.44 citationsh-index: 14Has CodeTrans. Mach. Learn. Res.

Originality Incremental advance

AI Analysis

This addresses the challenge of non-convex optimization in shallow neural networks for small-scale applications, but it is incremental as it builds on existing generative and regularization techniques.

The paper tackles the problem of training 2-layer neural networks with few hidden weights on small datasets, where gradient-based methods often get stuck in local minima, by initializing hidden weights using a learned generative model and refining them with gradient-based post-processing, demonstrating effectiveness through numerical examples.

We consider the approximation of functions by 2-layer neural networks with a small number of hidden weights based on the squared loss and small datasets. Due to the highly non-convex energy landscape, gradient-based training often suffers from local minima. As a remedy, we initialize the hidden weights with samples from a learned proposal distribution, which we parameterize as a deep generative model. To train this model, we exploit the fact that with fixed hidden weights, the optimal output weights solve a linear equation. After learning the generative model, we refine the sampled weights with a gradient-based post-processing in the latent space. Here, we also include a regularization scheme to counteract potential noise. Finally, we demonstrate the effectiveness of our approach by numerical examples.

View on arXiv PDF Code

Similar