TAG-INSTRUCT: Controlled Instruction Complexity Enhancement through Structure-based Augmentation
This work addresses the need for high-quality, complexity-controlled instruction data in LLM development, representing an incremental improvement over previous prompt-based approaches.
The paper tackles the problem of controlling instruction complexity for large language models by introducing TAG-INSTRUCT, a framework that compresses instructions into a tag space and enhances complexity through RL-guided expansion, resulting in outperformance over existing methods with improved controllability and stability.
High-quality instruction data is crucial for developing large language models (LLMs), yet existing approaches struggle to effectively control instruction complexity. We present TAG-INSTRUCT, a novel framework that enhances instruction complexity through structured semantic compression and controlled difficulty augmentation. Unlike previous prompt-based methods operating on raw text, TAG-INSTRUCT compresses instructions into a compact tag space and systematically enhances complexity through RL-guided tag expansion. Through extensive experiments, we show that TAG-INSTRUCT outperforms existing instruction complexity augmentation approaches. Our analysis reveals that operating in tag space provides superior controllability and stability across different instruction synthesis frameworks.