Tag-Evol: Achieving Efficient Instruction Evolving via Tag Injection
This addresses the problem of expensive and monolithic data synthesis for researchers and practitioners in AI, though it is incremental as it builds on Evol-Instruct.
The paper tackles the inefficiency and limited diversity in instruction evolving methods by proposing Tag-Evol, which uses knowledge tags for controlled evolution, resulting in significantly better evolved data across multiple benchmarks.
Evol-Instruct has made significant improvements as a data synthesis method in several areas. Existing methods typically rely on a fixed set of strategies to evolve, which require manual design and are monolithic in form. In addition, iterative evolution also makes the acquisition of hard samples expensive. In view of this, we propose the Tag-Evol framework, a more diverse and efficient instruction evolving method. Specifically, Tag-Evol uses diverse and specific knowledge tags as strategies to achieve controlled evolution by injecting different combinations of tags into the original instructions. Experiments with multiple backbones in diverse domain benchmarks show that the proposed method generates significantly better evolved data than other methods. Furthermore, we conduct a thorough analysis of the evolved data, demonstrating that Tag-Evol is not only efficient but also generates more diverse and challenging data.