CLApr 15, 2024

Bridging the Gap between Different Vocabularies for LLM Ensemble

arXiv:2404.09492v122.552 citationsh-index: 8Has CodeNAACL

Originality Incremental advance

AI Analysis

This addresses a bottleneck in LLM ensemble for researchers and practitioners, enabling more effective dynamic correction during generation, though it is incremental as it builds on existing ensemble concepts.

The paper tackles the problem of vocabulary discrepancies in ensembling large language models, which previously limited ensemble methods to selecting or blending complete outputs, and proposes EVA to align vocabularies for step-by-step ensemble, achieving superior results on tasks like commonsense reasoning and machine translation compared to individual models and prior ensemble methods.

Ensembling different large language models (LLMs) to unleash their complementary potential and harness their individual strengths is highly valuable. Nevertheless, vocabulary discrepancies among various LLMs have constrained previous studies to either selecting or blending completely generated outputs. This limitation hinders the dynamic correction and enhancement of outputs during the generation process, resulting in a limited capacity for effective ensemble. To address this issue, we propose a novel method to Ensemble LLMs via Vocabulary Alignment (EVA). EVA bridges the lexical gap among various LLMs, enabling meticulous ensemble at each generation step. Specifically, we first learn mappings between the vocabularies of different LLMs with the assistance of overlapping tokens. Subsequently, these mappings are employed to project output distributions of LLMs into a unified space, facilitating a fine-grained ensemble. Finally, we design a filtering strategy to exclude models that generate unfaithful tokens. Experimental results on commonsense reasoning, arithmetic reasoning, machine translation, and data-to-text generation tasks demonstrate the superiority of our approach compared with individual LLMs and previous ensemble methods conducted on complete outputs. Further analyses confirm that our approach can leverage knowledge from different language models and yield consistent improvement.

View on arXiv PDF Code

Similar