LG CV MLJun 25, 2019

The Difficulty of Training Sparse Neural Networks

Utku Evci, Fabian Pedregosa, Aidan Gomez, Erich Elsen

arXiv:1906.10732v325.4110 citations

Originality Synthesis-oriented

AI Analysis

This addresses the challenge of efficiently training sparse models for machine learning practitioners, though it is incremental as it builds on prior observations without proposing a new method.

The paper investigates why training sparse neural networks from scratch often yields worse solutions than pruning, and finds that while a linear decreasing path exists from initialization to good solutions, escaping bad sparse solutions requires traversing the dense subspace.

We investigate the difficulties of training sparse neural networks and make new observations about optimization dynamics and the energy landscape within the sparse regime. Recent work of \citep{Gale2019, Liu2018} has shown that sparse ResNet-50 architectures trained on ImageNet-2012 dataset converge to solutions that are significantly worse than those found by pruning. We show that, despite the failure of optimizers, there is a linear path with a monotonically decreasing objective from the initialization to the "good" solution. Additionally, our attempts to find a decreasing objective path from "bad" solutions to the "good" ones in the sparse subspace fail. However, if we allow the path to traverse the dense subspace, then we consistently find a path between two solutions. These findings suggest traversing extra dimensions may be needed to escape stationary points found in the sparse subspace.

View on arXiv PDF

Similar