LG CLAug 27, 2024

Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning

Simran Kaur, Simon Park, Anirudh Goyal, Sanjeev Arora

MILA

arXiv:2408.14774v424.125 citationsh-index: 35Has Code

Originality Incremental advance

AI Analysis

This addresses the challenge of efficient and effective instruction tuning for LLMs, though it is incremental as it builds on existing LLM capabilities.

The paper tackles the problem of creating diverse, high-quality instruction-tuning data for LLMs by introducing Instruct-SkillMix, an automated pipeline that extracts skills and generates data, resulting in strong performance gains such as a 42.76% win rate on AlpacaEval 2.0 with only 4K examples.

We introduce Instruct-SkillMix, an automated approach for creating diverse, high quality SFT data for instruction-following. The pipeline involves two stages, each leveraging an existing powerful LLM: (1) Skill extraction: uses the LLM to extract core "skills" for instruction-following by directly prompting the model. This is inspired by ``LLM metacognition'' of Didolkar et al. (2024); (2) Data generation: uses the powerful LLM to generate (instruction, response) data that exhibit a randomly chosen pair of these skills. Here, the use of random skill combinations promotes diversity and difficulty. The estimated cost of creating the dataset is under $600. Vanilla SFT (i.e., no PPO, DPO, or RL methods) on data generated from Instruct-SkillMix leads to strong gains on instruction following benchmarks such as AlpacaEval 2.0, MT-Bench, and WildBench. With just 4K examples, LLaMA-3-8B-Base achieves 42.76% length-controlled win rate on AlpacaEval 2.0, a level similar to frontier models like Claude 3 Opus and LLaMA-3.1-405B-Instruct. Ablation studies also suggest plausible reasons for why creating open instruction-tuning datasets via naive crowd-sourcing has proved difficult. In our dataset, adding 20% low quality answers (``shirkers'') causes a noticeable degradation in performance. The Instruct-SkillMix pipeline seems flexible and adaptable to other settings.

View on arXiv PDF Code

Similar