CL AIJan 14

ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection

Tao Liu, Taiqiang Wu, Runming Yang, Shaoning Sun, Junjie Wang, Yujiu Yang

arXiv:2601.09195v15.012 citationsh-index: 11

Originality Incremental advance

AI Analysis

This addresses overfitting issues in SFT for LLMs, offering a more efficient alternative to using multiple reference answers, though it is incremental in its approach.

The paper tackles the problem of overfitting in supervised fine-tuning (SFT) of large language models by proposing ProFit, a method that masks low-probability tokens to prevent surface-level overfitting, resulting in consistent performance improvements on general reasoning and mathematical benchmarks.

Supervised fine-tuning (SFT) is a fundamental post-training strategy to align Large Language Models (LLMs) with human intent. However, traditional SFT often ignores the one-to-many nature of language by forcing alignment with a single reference answer, leading to the model overfitting to non-core expressions. Although our empirical analysis suggests that introducing multiple reference answers can mitigate this issue, the prohibitive data and computational costs necessitate a strategic shift: prioritizing the mitigation of single-reference overfitting over the costly pursuit of answer diversity. To achieve this, we reveal the intrinsic connection between token probability and semantic importance: high-probability tokens carry the core logical framework, while low-probability tokens are mostly replaceable expressions. Based on this insight, we propose ProFit, which selectively masks low-probability tokens to prevent surface-level overfitting. Extensive experiments confirm that ProFit consistently outperforms traditional SFT baselines on general reasoning and mathematical benchmarks.

View on arXiv PDF

Similar