CL AIApr 6, 2023

Instruction Tuning with GPT-4

Baolin Peng, Chunyuan Li, Pengcheng He, Michel Galley, Jianfeng Gao

Microsoft

arXiv:2304.03277v138.2824 citationsh-index: 59Has Code

Originality Incremental advance

AI Analysis

This work improves instruction tuning for LLMs, potentially benefiting AI developers and researchers, though it appears incremental as it builds on prior methods with a newer model.

The researchers tackled the problem of generating instruction-following data for LLM finetuning by using GPT-4 instead of previous models, resulting in superior zero-shot performance on new tasks with 52K English and Chinese data points.

Prior work has shown that finetuning large language models (LLMs) using machine-generated instruction-following data enables such models to achieve remarkable zero-shot capabilities on new tasks, and no human-written instructions are needed. In this paper, we present the first attempt to use GPT-4 to generate instruction-following data for LLM finetuning. Our early experiments on instruction-tuned LLaMA models show that the 52K English and Chinese instruction-following data generated by GPT-4 leads to superior zero-shot performance on new tasks to the instruction-following data generated by previous state-of-the-art models. We also collect feedback and comparison data from GPT-4 to enable a comprehensive evaluation and reward model training. We make our data generated using GPT-4 as well as our codebase publicly available.

View on arXiv PDF Code

Similar