CLMay 24, 2025

TAG-INSTRUCT: Controlled Instruction Complexity Enhancement through Structure-based Augmentation

He Zhu, Zhiwen Ruan, Junyou Su, Xingwei He, Yun Chen, Wenjia Zhang, Guanhua Chen

arXiv:2505.18557v28.32 citationsh-index: 6ACL

Originality Incremental advance

AI Analysis

This work addresses the need for high-quality, complexity-controlled instruction data in LLM development, representing an incremental improvement over previous prompt-based approaches.

The paper tackles the problem of controlling instruction complexity for large language models by introducing TAG-INSTRUCT, a framework that compresses instructions into a tag space and enhances complexity through RL-guided expansion, resulting in outperformance over existing methods with improved controllability and stability.

High-quality instruction data is crucial for developing large language models (LLMs), yet existing approaches struggle to effectively control instruction complexity. We present TAG-INSTRUCT, a novel framework that enhances instruction complexity through structured semantic compression and controlled difficulty augmentation. Unlike previous prompt-based methods operating on raw text, TAG-INSTRUCT compresses instructions into a compact tag space and systematically enhances complexity through RL-guided tag expansion. Through extensive experiments, we show that TAG-INSTRUCT outperforms existing instruction complexity augmentation approaches. Our analysis reveals that operating in tag space provides superior controllability and stability across different instruction synthesis frameworks.

View on arXiv PDF

Similar