LGOct 22, 2020

PHEW: Constructing Sparse Networks that Learn Fast and Generalize Well without Training Data

arXiv:2010.11354v225 citations
Originality Incremental advance
AI Analysis

This work addresses efficiency in neural network training and inference for practitioners, though it is incremental as it builds on existing NTK-based methods.

The paper tackles the problem of designing sparse neural networks for faster convergence and better generalization without using training data, by proposing PHEW, a method that improves over prior data-agnostic methods like Synflow-L2, achieving significant performance gains across various network densities.

Methods that sparsify a network at initialization are important in practice because they greatly improve the efficiency of both learning and inference. Our work is based on a recently proposed decomposition of the Neural Tangent Kernel (NTK) that has decoupled the dynamics of the training process into a data-dependent component and an architecture-dependent kernel - the latter referred to as Path Kernel. That work has shown how to design sparse neural networks for faster convergence, without any training data, using the Synflow-L2 algorithm. We first show that even though Synflow-L2 is optimal in terms of convergence, for a given network density, it results in sub-networks with "bottleneck" (narrow) layers - leading to poor performance as compared to other data-agnostic methods that use the same number of parameters. Then we propose a new method to construct sparse networks, without any training data, referred to as Paths with Higher-Edge Weights (PHEW). PHEW is a probabilistic network formation method based on biased random walks that only depends on the initial weights. It has similar path kernel properties as Synflow-L2 but it generates much wider layers, resulting in better generalization and performance. PHEW achieves significant improvements over the data-independent SynFlow and SynFlow-L2 methods at a wide range of network densities.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes