LGAIDCJan 17, 2022

Egeria: Efficient DNN Training with Knowledge-Guided Layer Freezing

arXiv:2201.06227v266 citations
AI Analysis

This addresses efficiency in DNN training for practitioners, though it is incremental as it builds on existing layer freezing ideas with a novel guidance method.

The paper tackles the time-consuming nature of DNN training by proposing Egeria, a system that freezes well-trained layers during training to skip computation and communication, achieving 19%-43% training speedup without accuracy loss.

Training deep neural networks (DNNs) is time-consuming. While most existing solutions try to overlap/schedule computation and communication for efficient training, this paper goes one step further by skipping computing and communication through DNN layer freezing. Our key insight is that the training progress of internal DNN layers differs significantly, and front layers often become well-trained much earlier than deep layers. To explore this, we first introduce the notion of training plasticity to quantify the training progress of internal DNN layers. Then we design Egeria, a knowledge-guided DNN training system that employs semantic knowledge from a reference model to accurately evaluate individual layers' training plasticity and safely freeze the converged ones, saving their corresponding backward computation and communication. Our reference model is generated on the fly using quantization techniques and runs forward operations asynchronously on available CPUs to minimize the overhead. In addition, Egeria caches the intermediate outputs of the frozen layers with prefetching to further skip the forward computation. Our implementation and testbed experiments with popular vision and language models show that Egeria achieves 19%-43% training speedup w.r.t. the state-of-the-art without sacrificing accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes