CL AISep 28, 2025

The Hidden Costs of Translation Accuracy: Distillation, Quantization, and Environmental Impact

arXiv:2509.23990v21 citationsh-index: 6

Originality Synthesis-oriented

AI Analysis

It addresses computational and environmental costs in NLP for researchers and practitioners, though it is incremental in applying existing compression methods to translation.

This study investigated the trade-offs between translation quality and efficiency by comparing full-scale, distilled, and quantized models, finding that distilled and quantized models reduced inference time by up to 78% and carbon emissions by up to 65% with minimal accuracy loss.

The rapid expansion of large language models (LLMs) has heightened concerns about their computational and environmental costs. This study investigates the trade-offs between translation quality and efficiency by comparing full-scale, distilled, and quantized models using machine translation as a case study. We evaluated performance on the Flores+ benchmark and through human judgments of conversational translations in French, Hindi, and Kannada. Our analysis revealed that the full 3.3B FP32 model, while achieving the highest BLEU scores, incurred the largest environmental footprint (~ 0.007-0.008 kg CO2 per run). The distilled 600M FP32 model reduced inference time by 71-78% and carbon emissions by 63-65% compared with the full model, with only minimal reductions in BLEU scores. Human evaluations further showed that even aggressive quantization (INT4) preserved high levels of accuracy and fluency, with differences between models generally minor. These findings demonstrate that model compression strategies can substantially reduce computational demands and environmental impact while maintaining competitive translation quality, though trade-offs are more pronounced in low-resource settings. We argue for evaluation frameworks that integrate efficiency and sustainability alongside accuracy as central dimensions of progress in NLP.

View on arXiv PDF

Similar