LGNAMLNov 11, 2024

Generative Feature Training of Thin 2-Layer Networks

arXiv:2411.06848v24 citationsh-index: 14Trans. Mach. Learn. Res.
Originality Incremental advance
AI Analysis

This addresses the challenge of non-convex optimization in shallow neural networks for small-scale applications, but it is incremental as it builds on existing generative and regularization techniques.

The paper tackles the problem of training 2-layer neural networks with few hidden weights on small datasets, where gradient-based methods often get stuck in local minima, by initializing hidden weights using a learned generative model and refining them with gradient-based post-processing, demonstrating effectiveness through numerical examples.

We consider the approximation of functions by 2-layer neural networks with a small number of hidden weights based on the squared loss and small datasets. Due to the highly non-convex energy landscape, gradient-based training often suffers from local minima. As a remedy, we initialize the hidden weights with samples from a learned proposal distribution, which we parameterize as a deep generative model. To train this model, we exploit the fact that with fixed hidden weights, the optimal output weights solve a linear equation. After learning the generative model, we refine the sampled weights with a gradient-based post-processing in the latent space. Here, we also include a regularization scheme to counteract potential noise. Finally, we demonstrate the effectiveness of our approach by numerical examples.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes