CLLGJun 25, 2024

CharED: Character-wise Ensemble Decoding for Large Language Models

arXiv:2407.11009v13 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the challenge of efficiently ensembling diverse LLMs for practitioners, though it is incremental as it builds on existing ensembling concepts.

The paper tackles the problem of combining multiple large language models (LLMs) at inference time without shared vocabularies or tokenization, proposing CharED, a character-wise ensemble decoding method that averages outputs character by character, resulting in improved performance on coding, math, and toxicity benchmarks compared to individual models.

Large language models (LLMs) have shown remarkable potential for problem solving, with open source models achieving increasingly impressive performance on benchmarks measuring areas from logical reasoning to mathematical ability. Ensembling models can further improve capabilities across a variety of domains. However, conventional methods of combining models at inference time such as shallow fusion necessitate a shared vocabulary and tokenization, and alternatives like fine-tuning for domain-specific performance are both time consuming and computationally expensive. We therefore present an inference-time ensembling algorithm aimed at "averaging" outputs from multiple LLMs and illustrate its improved performance across multiple domains compared to its constituent models alone. Character-wise ensemble decoding, CharED, finds the marginal distribution of each character for an individual model and performs a weighted average to generate an output, character by character. In coding, math, and toxicity benchmarks, we find our proposed model able to combine complimentary strengths of multiple LLMs, regardless of vocabulary, tokenization, or model size.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes