CLLGFeb 10

Life Cycle-Aware Evaluation of Knowledge Distillation for Machine Translation: Environmental Impact and Translation Quality Trade-offs

arXiv:2602.09691v1h-index: 5
Originality Incremental advance
AI Analysis

This work addresses the problem of selecting knowledge distillation methods under compute-induced constraints for machine translation practitioners, providing a reproducible protocol for balancing environmental impact and translation quality.

The study evaluated knowledge distillation methods for machine translation by considering both translation quality and computational cost, expressed as carbon footprint using a life cycle assessment tool, finding that distillation overhead dominates at small deployment volumes and inference dominates at scale, with word-level distillation offering better trade-offs.

Knowledge distillation (KD) is a tool to compress a larger system (teacher) into a smaller one (student). In machine translation, studies typically report only the translation quality of the student and omit the computational complexity of performing KD, making it difficult to select among the many available KD choices under compute-induced constraints. In this study, we evaluate representative KD methods by considering both translation quality and computational cost. We express computational cost as a carbon footprint using the machine learning life cycle assessment (MLCA) tool. This assessment accounts for runtime operational emissions and amortized hardware production costs throughout the KD model life cycle (teacher training, distillation, and inference). We find that (i) distillation overhead dominates the total footprint at small deployment volumes, (ii) inference dominates at scale, making KD beneficial only beyond a task-dependent usage threshold, and (iii) word-level distillation typically offers more favorable footprint-quality trade-offs than sequence-level distillation. Our protocol provides reproducible guidance for selecting KD methods under explicit quality and compute-induced constraints.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes