CompactQE: Interpretable Translation Quality Estimation via Small Open-Weight LLMs
This work addresses data privacy and cost concerns in quality estimation by demonstrating that smaller, open-source models are a viable alternative for practitioners.
CompactQE shows that small open-weight LLMs (<30B parameters) can match or exceed the performance of large proprietary models in machine translation quality estimation, achieving system-level correlations that outperform traditional metrics and human inter-annotator agreement.
Current state-of-the-art Quality Estimation (QE) in machine translation relies on massive, proprietary LLMs, raising data privacy concerns. We demonstrate that smaller, open-source LLMs (<30B parameters) are a viable, cost-effective and privacy-preserving alternative. Using a single-pass prompting strategy, our models simultaneously generate quality scores, MQM error annotations, suggested error corrections, and full post-editions. Our analysis shows these models achieve highly competitive system-level correlations with human judgments that outperform traditional neural metrics, fine-tuned models, and human inter-annotator agreement, effectively approximating the capabilities of much larger proprietary LLMs.