CLJul 29, 2024

Cool-Fusion: Fuse Large Language Models without Training

arXiv:2407.19807v212 citationsh-index: 6
Originality Incremental advance
AI Analysis

This addresses the challenge of high computational load in model fusion for LLM applications, though it appears incremental as it builds on ensemble methods.

The paper tackles the problem of fusing heterogeneous large language models (LLMs) to leverage complementary strengths without training, proposing Cool-Fusion, which increases accuracy on GSM8K by 17.4% from three source LLMs.

We focus on the problem of fusing two or more heterogeneous large language models (LLMs) to leverage their complementary strengths. One of the challenges of model fusion is high computational load, specifically in fine-tuning or aligning vocabularies. To address this, we propose Cool-Fusion, a simple yet effective approach that fuses the knowledge of source LLMs, which does not require training. Unlike ensemble methods, Cool-Fusion is applicable to any set of source LLMs that have different vocabularies. To overcome the vocabulary discrepancies among LLMs, we ensemble LLMs on text level, allowing them to rerank the generated texts by each other with different granularities. Extensive experiments have been conducted across a variety of benchmark datasets. On GSM8K, Cool-Fusion increases accuracy from three strong source LLMs by a significant margin of 17.4\%.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes