LGAICLJan 24, 2024

Instruction Fine-Tuning: Does Prompt Loss Matter?

arXiv:2401.13586v438 citationsEMNLP
Originality Incremental advance
AI Analysis

This research addresses a practical issue for API providers and users by highlighting the importance of the PLW parameter in SIFT, though it is incremental as it builds on existing fine-tuning methods.

The study investigated the impact of prompt loss token weights (PLW) on supervised instruction fine-tuning (SIFT), finding that performance on short-completion data has a statistically significant negative quadratic relationship with PLW, with small values (0.01-0.5) improving results on multiple-choice and short-generation benchmarks and large values (~1.0) benefiting long-generation benchmarks.

We present a novel study analyzing the effects of various prompt loss token weights (PLW) for supervised instruction fine-tuning (SIFT). While prompt-masking (PLW = 0) is common for SIFT, some fine-tuning APIs support fractional PLWs and suggest that using a small non-zero PLW can help stabilize learning when fine-tuning on short-completion data. However, there has never been a study confirming this claim, and OpenAI, a major cloud-based SIFT provider, recently removed this parameter from their fine-tuning API. We found that performance of models fine-tuned on short-completion data had a statistically-significant negative quadratic relationship with PLW. Using small values (0.01 - 0.5) of PLW produced better results on multiple-choice and short-generation benchmarks (outperforming models fine-tuned on long-completion data) while large values (~ 1.0) of PLW produced better results on long-generation benchmarks. We explained this effect and verified its importance through additional experiments. This research serves as a warning to API providers about the importance of providing a PLW parameter for SIFT.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes