CL LGJul 3, 2024

How Does Quantization Affect Multilingual LLMs?

Kelly Marchisio, Saurabh Dash, Hongyu Chen, Dennis Aumiller, Ahmet Üstün, Sara Hooker, Sebastian Ruder

arXiv:2407.03211v219.437 citationsh-index: 56

Originality Incremental advance

AI Analysis

This work addresses the problem of ensuring equitable performance across languages for efficient NLP deployment, which is critical for global adoption, and is incremental as it extends existing quantization analysis to multilingual contexts.

The study analyzed the impact of quantization on multilingual large language models, finding that harmful effects are underestimated by automatic metrics, with a 1.7% average drop in Japanese tasks corresponding to a 16.0% drop in human evaluation, and non-Latin script languages are worst affected.

Quantization techniques are widely used to improve inference speed and deployment of large language models. While a wide body of work examines the impact of quantization on LLMs in English, none have evaluated across languages. We conduct a thorough analysis of quantized multilingual LLMs, focusing on performance across languages and at varying scales. We use automatic benchmarks, LLM-as-a-Judge, and human evaluation, finding that (1) harmful effects of quantization are apparent in human evaluation, which automatic metrics severely underestimate: a 1.7% average drop in Japanese across automatic tasks corresponds to a 16.0% drop reported by human evaluators on realistic prompts; (2) languages are disparately affected by quantization, with non-Latin script languages impacted worst; and (3) challenging tasks like mathematical reasoning degrade fastest. As the ability to serve low-compute models is critical for wide global adoption of NLP technologies, our results urge consideration of multilingual performance as a key evaluation criterion for efficient models.

View on arXiv PDF

Similar