TimelyFreeze: Adaptive Parameter Freezing Mechanism for Pipeline Parallelism
This work offers a strong specific gain in training throughput for researchers and practitioners working with large-scale models using pipeline parallelism, without compromising model accuracy.
This paper addresses the problem of pipeline bubbles in pipeline parallelism, which limit throughput in large model training. The authors propose TimelyFreeze, a method that optimizes parameter freezing to minimize batch execution time while maintaining accuracy, achieving up to 40% training throughput improvement on LLaMA-8B.
Pipeline parallelism enables training models that exceed single-device memory, but practical throughput remains limited by pipeline bubbles. Although parameter freezing can improve training throughput by adaptively skipping backward computation, existing methods often over-freeze parameters, resulting in unnecessary accuracy degradation. To address this issue, we propose TimelyFreeze, which models the pipeline schedule as a directed acyclic graph and solves a linear program to compute optimal freeze ratios that minimize batch execution time under accuracy constraints. Experiments show that TimelyFreeze achieves up to 40% training throughput improvement on LLaMA-8B with comparable accuracy. Overall, it enables faster large-scale model training without compromising convergence and generalizes across diverse pipeline-parallel settings.