Supervised Fine-Tuning Needs to Unlock the Potential of Token Priority
This is an incremental position paper that reframes existing methods for researchers in AI alignment and fine-tuning.
The paper argues that Supervised Fine-Tuning (SFT) should use Token Priority to address a granularity mismatch in aligning AI models with human utility, categorizing approaches into Positive Priority for noise filtration and Signed Priority for toxic mode unlearning.
The transition from fitting empirical data to achieving true human utility is fundamentally constrained by a granularity mismatch, where fine-grained autoregressive generation is often supervised by coarse or uniform signals. This position paper advocates Token Priority as the essential bridge, formalizing Supervised Fine-Tuning (SFT) not as simple optimization but as a precise distribution reshaping process that aligns raw data with the ideal alignment manifold. We analyze recent breakthroughs through this unified lens, categorizing them into two distinct regimes: Positive Priority for noise filtration and Signed Priority for toxic modes unlearning. We revisit existing progress and limitations, identify key challenges, and suggest directions for future research.