LGAICLAug 25, 2025

Proximal Supervised Fine-Tuning

arXiv:2508.17784v112 citationsh-index: 9
Originality Incremental advance
AI Analysis

This work addresses generalization issues in fine-tuning for AI models, offering a method to maintain prior capabilities while adapting to new tasks, though it is incremental as it builds on existing RL techniques like TRPO and PPO.

The paper tackles the problem of poor generalization in supervised fine-tuning (SFT) of foundation models, where prior capabilities deteriorate after tuning on new tasks or domains, and proposes Proximal SFT (PSFT) to address this. Experiments show that PSFT matches SFT in-domain, outperforms it in out-of-domain generalization, remains stable under prolonged training without entropy collapse, and provides a stronger foundation for subsequent optimization.

Supervised fine-tuning (SFT) of foundation models often leads to poor generalization, where prior capabilities deteriorate after tuning on new tasks or domains. Inspired by trust-region policy optimization (TRPO) and proximal policy optimization (PPO) in reinforcement learning (RL), we propose Proximal SFT (PSFT). This fine-tuning objective incorporates the benefits of trust-region, effectively constraining policy drift during SFT while maintaining competitive tuning. By viewing SFT as a special case of policy gradient methods with constant positive advantages, we derive PSFT that stabilizes optimization and leads to generalization, while leaving room for further optimization in subsequent post-training stages. Experiments across mathematical and human-value domains show that PSFT matches SFT in-domain, outperforms it in out-of-domain generalization, remains stable under prolonged training without causing entropy collapse, and provides a stronger foundation for the subsequent optimization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes