CLLGJul 3, 2024

How Does Quantization Affect Multilingual LLMs?

arXiv:2407.03211v237 citationsh-index: 56
AI Analysis

This work addresses the problem of ensuring equitable performance across languages for efficient NLP deployment, which is critical for global adoption, and is incremental as it extends existing quantization analysis to multilingual contexts.

The study analyzed the impact of quantization on multilingual large language models, finding that harmful effects are underestimated by automatic metrics, with a 1.7% average drop in Japanese tasks corresponding to a 16.0% drop in human evaluation, and non-Latin script languages are worst affected.

Quantization techniques are widely used to improve inference speed and deployment of large language models. While a wide body of work examines the impact of quantization on LLMs in English, none have evaluated across languages. We conduct a thorough analysis of quantized multilingual LLMs, focusing on performance across languages and at varying scales. We use automatic benchmarks, LLM-as-a-Judge, and human evaluation, finding that (1) harmful effects of quantization are apparent in human evaluation, which automatic metrics severely underestimate: a 1.7% average drop in Japanese across automatic tasks corresponds to a 16.0% drop reported by human evaluators on realistic prompts; (2) languages are disparately affected by quantization, with non-Latin script languages impacted worst; and (3) challenging tasks like mathematical reasoning degrade fastest. As the ability to serve low-compute models is critical for wide global adoption of NLP technologies, our results urge consideration of multilingual performance as a key evaluation criterion for efficient models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes