MLLGMay 7, 2019

A Generative Model for Sampling High-Performance and Diverse Weights for Neural Networks

arXiv:1905.02898v116 citations
Originality Incremental advance
AI Analysis

This work addresses the need for efficient weight sampling and ensembling in neural networks, offering a method to enhance model performance with reduced computational cost.

The authors tackled the problem of efficiently generating diverse, high-performance neural network weights by training a hypernetwork to map latent vectors to low-loss weight manifolds, achieving improved classification accuracy through ensembling and distillation.

Recent work on mode connectivity in the loss landscape of deep neural networks has demonstrated that the locus of (sub-)optimal weight vectors lies on continuous paths. In this work, we train a neural network that serves as a hypernetwork, mapping a latent vector into high-performance (low-loss) weight vectors, generalizing recent findings of mode connectivity to higher dimensional manifolds. We formulate the training objective as a compromise between accuracy and diversity, where the diversity takes into account trivial symmetry transformations of the target network. We demonstrate how to reduce the number of parameters in the hypernetwork by parameter sharing. Once learned, the hypernetwork allows for a computationally efficient, ancestral sampling of neural network weights, which we recruit to form large ensembles. The improvement in classification accuracy obtained by this ensembling indicates that the generated manifold extends in dimensions other than directions implied by trivial symmetries. For computational efficiency, we distill an ensemble into a single classifier while retaining generalization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes