DAST: Difficulty-Aware Self-Training on Large Language Models
This addresses a bottleneck in LLM self-training for improving performance on challenging tasks, but it is incremental as it builds on existing self-training methods with a difficulty-aware strategy.
The paper tackled the problem of LLMs under-sampling challenging queries in self-training, which limits their ability on difficult problems, by proposing a difficulty-aware self-training (DAST) framework that improves the quantity and quality of self-generated responses on such queries, with experiments on mathematical tasks demonstrating its effectiveness and generalization.
Present Large Language Models (LLM) self-training methods always under-sample on challenging queries, leading to inadequate learning on difficult problems which limits LLMs' ability. Therefore, this work proposes a difficulty-aware self-training (DAST) framework that focuses on improving both the quantity and quality of self-generated responses on challenging queries during self-training. DAST is specified in three components: 1) sampling-based difficulty level estimation, 2) difficulty-aware data augmentation, and 3) the self-training algorithm using SFT and DPO respectively. Experiments on mathematical tasks demonstrate the effectiveness and generalization of DAST, highlighting the critical role of difficulty-aware strategies in advancing LLM self-training.