LGMLSep 28, 2020

On Efficient Constructions of Checkpoints

arXiv:2009.13003v121 citations
Originality Incremental advance
AI Analysis

This work addresses the need for efficient checkpointing in deep learning training, offering significant improvements in storage and recovery times, though it is incremental as it builds on existing methods like SCAR.

The paper tackles the problem of efficiently constructing checkpoints for deep learning training by proposing LC-Checkpoint, a lossy compression scheme that uses quantization, priority promotion, and Huffman coding to optimize compression and recovery speed under SGD assumptions, achieving up to 28x compression rate and 5.77x recovery speedup over the state-of-the-art SCAR algorithm.

Efficient construction of checkpoints/snapshots is a critical tool for training and diagnosing deep learning models. In this paper, we propose a lossy compression scheme for checkpoint constructions (called LC-Checkpoint). LC-Checkpoint simultaneously maximizes the compression rate and optimizes the recovery speed, under the assumption that SGD is used to train the model. LC-Checkpointuses quantization and priority promotion to store the most crucial information for SGD to recover, and then uses a Huffman coding to leverage the non-uniform distribution of the gradient scales. Our extensive experiments show that LC-Checkpoint achieves a compression rate up to $28\times$ and recovery speedup up to $5.77\times$ over a state-of-the-art algorithm (SCAR).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes