LGDec 13, 2023

Principled Weight Initialization for Hypernetworks

arXiv:2312.08399v189 citationsh-index: 21ICLR
Originality Incremental advance
AI Analysis

This addresses a foundational issue in hypernetwork optimization, which is incremental but important for applications like multi-task learning and Bayesian deep learning.

The paper tackled the problem of optimizing hypernetworks by developing principled weight initialization techniques, resulting in more stable mainnet weights, lower training loss, and faster convergence.

Hypernetworks are meta neural networks that generate weights for a main neural network in an end-to-end differentiable manner. Despite extensive applications ranging from multi-task learning to Bayesian deep learning, the problem of optimizing hypernetworks has not been studied to date. We observe that classical weight initialization methods like Glorot & Bengio (2010) and He et al. (2015), when applied directly on a hypernet, fail to produce weights for the mainnet in the correct scale. We develop principled techniques for weight initialization in hypernets, and show that they lead to more stable mainnet weights, lower training loss, and faster convergence.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes