CLJul 15, 2024

Multilingual Contrastive Decoding via Language-Agnostic Layers Skipping

Wenhao Zhu, Sizhe Liu, Shujian Huang, Shuaijie She, Chris Wendler, Jiajun Chen

arXiv:2407.10795v114.425 citationsh-index: 34Has Code

Originality Incremental advance

AI Analysis

This work addresses a specific bottleneck in multilingual reasoning for users of large language models, representing an incremental improvement over existing contrastive decoding methods.

The authors tackled the problem of poor performance of contrastive decoding methods on non-English tasks by proposing an improved algorithm that skips language-agnostic layers to better align early and final outputs, resulting in substantial improvements in chain-of-thought reasoning accuracy across 11 languages.

Decoding by contrasting layers (DoLa), is designed to improve the generation quality of large language models (LLMs) by contrasting the prediction probabilities between an early exit output (amateur logits) and the final output (expert logits). However, we find that this approach does not work well on non-English tasks. Inspired by previous interpretability work on language transition during the model's forward pass, we discover that this issue arises from a language mismatch between early exit output and final output. In this work, we propose an improved contrastive decoding algorithm that is effective for diverse languages beyond English. To obtain more helpful amateur logits, we devise two strategies to skip a set of bottom, language-agnostic layers based on our preliminary analysis. Experimental results on multilingual reasoning benchmarks demonstrate that our proposed method outperforms previous contrastive decoding baselines and substantially improves LLM's chain-of-thought reasoning accuracy across 11 languages. The project will be available at: https://github.com/NJUNLP/SkipLayerCD.

View on arXiv PDF Code

Similar