FreezeOut: Accelerate Training by Progressively Freezing Layers
This method addresses training efficiency for deep learning practitioners, but it is incremental as it builds on existing network architectures.
The paper tackled the problem of high computational cost in training deep neural networks by progressively freezing layers, achieving up to 20% wall-clock time savings with minimal accuracy loss (e.g., 3% for DenseNets, no loss for ResNets) in experiments on CIFAR.
The early layers of a deep neural net have the fewest parameters, but take up the most computation. In this extended abstract, we propose to only train the hidden layers for a set portion of the training run, freezing them out one-by-one and excluding them from the backward pass. Through experiments on CIFAR, we empirically demonstrate that FreezeOut yields savings of up to 20% wall-clock time during training with 3% loss in accuracy for DenseNets, a 20% speedup without loss of accuracy for ResNets, and no improvement for VGG networks. Our code is publicly available at https://github.com/ajbrock/FreezeOut