LGCLSep 24, 2024

Supervised Fine-Tuning Achieve Rapid Task Adaption Via Alternating Attention Head Activation Patterns

arXiv:2409.15820v26 citationsh-index: 16
AI Analysis

This addresses the challenge of improving LLM adaptation to complex tasks with limited data, though it appears incremental as it builds on existing SFT methods with new insights into attention mechanisms.

The paper tackled the problem of LLMs' unsatisfactory performance on complex tasks due to scarce instructions, by analyzing how supervised fine-tuning (SFT) adapts LLMs via attention patterns, finding that LLMs selectively activate task-specific heads and that patterns for complex tasks combine basic ones, leading to enhanced SFT efficiency and effectiveness.

LLMs' performance on complex tasks is still unsatisfactory. A key issue is that presently LLMs learn in a data-driven schema, while the instructions about these complex tasks are both scarce and hard to collect or construct. On the contrary, a prominent phenomenon is that LLMs can learn rather fast on simpler tasks with adequate prior knowledge captured during pretraining stage. Thus, if the prerequisite and mechanism of such rapid generalization could be elucidated, it could enhance the efficiency and effectiveness of the LLM's ability to learn complex tasks. Thus, in this paper, we employ a gradient-based method, to dissect the process that the SFT process adapts LLMs to downstream tasks via the perspective of attention patterns. We find that: (1) LLMs selectively activate task-specific attention heads during SFT; (2) activation patterns for complex tasks are combinations of basic task patterns; and (3) changes in a few parameters can significantly impact activation patterns after SFT on a small number of samples.Based on these insights, experiments are conducted to actually enhance the efficiency and effectiveness of SFT.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes