CLMar 12, 2025

DAST: Difficulty-Aware Self-Training on Large Language Models

Boyang Xue, Qi Zhu, Hongru Wang, Rui Wang, Sheng Wang, Hongling Xu, Fei Mi, Yasheng Wang, Lifeng Shang, Qun Liu, Kam-Fai Wong

arXiv:2503.09029v16.72 citationsh-index: 17

Originality Incremental advance

AI Analysis

This addresses a bottleneck in LLM self-training for improving performance on challenging tasks, but it is incremental as it builds on existing self-training methods with a difficulty-aware strategy.

The paper tackled the problem of LLMs under-sampling challenging queries in self-training, which limits their ability on difficult problems, by proposing a difficulty-aware self-training (DAST) framework that improves the quantity and quality of self-generated responses on such queries, with experiments on mathematical tasks demonstrating its effectiveness and generalization.

Present Large Language Models (LLM) self-training methods always under-sample on challenging queries, leading to inadequate learning on difficult problems which limits LLMs' ability. Therefore, this work proposes a difficulty-aware self-training (DAST) framework that focuses on improving both the quantity and quality of self-generated responses on challenging queries during self-training. DAST is specified in three components: 1) sampling-based difficulty level estimation, 2) difficulty-aware data augmentation, and 3) the self-training algorithm using SFT and DPO respectively. Experiments on mathematical tasks demonstrate the effectiveness and generalization of DAST, highlighting the critical role of difficulty-aware strategies in advancing LLM self-training.

View on arXiv PDF

Similar