CLAIJan 21, 2025

From Drafts to Answers: Unlocking LLM Potential via Aggregation Fine-Tuning

arXiv:2501.11877v111 citationsh-index: 32
Originality Highly original
AI Analysis

This addresses the challenge of enhancing LLM capabilities efficiently for AI researchers and practitioners, offering a novel fine-tuning paradigm that is not incremental.

The paper tackles the problem of improving large language model performance without scaling data or model size by introducing Aggregation Fine-Tuning (AFT), where models learn to synthesize multiple draft responses into refined answers, resulting in a 41.3% LC win rate on AlpacaEval 2 with a model fine-tuned from Llama3.1-8B-Base.

Scaling data and model size has been proven effective for boosting the performance of large language models. In addition to training-time scaling, recent studies have revealed that increasing test-time computational resources can further improve performance. In this work, we introduce Aggregation Fine-Tuning (AFT), a supervised finetuning paradigm where the model learns to synthesize multiple draft responses, referred to as proposals, into a single, refined answer, termed aggregation. At inference time, a propose-and-aggregate strategy further boosts performance by iteratively generating proposals and aggregating them. Empirical evaluations on benchmark datasets show that AFT-trained models substantially outperform standard SFT. Notably, an AFT model, fine-tuned from Llama3.1-8B-Base with only 64k data, achieves a 41.3% LC win rate on AlpacaEval 2, surpassing significantly larger LLMs such as Llama3.1-405B-Instruct and GPT4. By combining sequential refinement and parallel sampling, the propose-and-aggregate framework scales inference-time computation in a flexible manner. Overall, These findings position AFT as a promising approach to unlocking additional capabilities of LLMs without resorting to increasing data volume or model size.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes