LGAIOct 22, 2025

Study of Training Dynamics for Memory-Constrained Fine-Tuning

arXiv:2510.19675v11 citationsh-index: 18
Originality Highly original
AI Analysis

This addresses the problem of deploying large models in memory-constrained environments, offering a novel method for efficient fine-tuning.

The paper tackles memory-efficient fine-tuning of large neural networks by proposing TraDy, a transfer learning scheme that uses dynamic stochastic channel selection and layer importance analysis, achieving up to 99% activation sparsity and 97% reduction in FLOPs for weight derivative computation.

Memory-efficient training of deep neural networks has become increasingly important as models grow larger while deployment environments impose strict resource constraints. We propose TraDy, a novel transfer learning scheme leveraging two key insights: layer importance for updates is architecture-dependent and determinable a priori, while dynamic stochastic channel selection provides superior gradient approximation compared to static approaches. We introduce a dynamic channel selection approach that stochastically resamples channels between epochs within preselected layers. Extensive experiments demonstrate TraDy achieves state-of-the-art performance across various downstream tasks and architectures while maintaining strict memory constraints, achieving up to 99% activation sparsity, 95% weight derivative sparsity, and 97% reduction in FLOPs for weight derivative computation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes