CL HCMay 29

Translation Analytics for Freelancers II: Benchmarking Local LLMs for Confidential Translation Workflows

Yuri Balashov, Rex VanHorn, Mingxi Xu, Austin Downes

arXiv:2605.3145288.9

AI Analysis

This work provides practical evaluation methods for freelance translators and smaller language service providers to select suitable local LLMs for confidentiality-sensitive translation, addressing a critical privacy concern.

This paper benchmarks locally runnable language models (LLMs) for confidential translation workflows, expanding the Reeve Foundation Trilingual Corpus (RFTC) into a multilingual corpus (RFMC) with German and Simplified Chinese. Benchmarking across four language directions on 1000+ sentences, the best local LLMs matched or surpassed local NMT systems and a frontier LLM (GPT-5.2), though they did not outperform top commercial NMTs.

Building on our previous work, this paper develops practical, low-barrier methods for freelance translators and smaller language service providers to evaluate translation technologies using rigorous yet accessible analytic methods. Here we address a high-stakes, specialized need: offline translation for confidentiality-sensitive domains in which privacy constraints preclude the use of cloud-based engines and commercial LLMs. We expand the Reeve Foundation Trilingual Corpus (RFTC) used in our previous work into a multilingual corpus (RFMC) by adding sentence-aligned German and Simplified Chinese reference translations. We then benchmark several locally runnable language models (via Ollama) across four language directions on 1000+ sentences selected from this corpus. We use consistent single-prompt calls without fine-tuning or domain adaptation, comparing local LLM outputs against commercial NMTs (DeepL, Baidu), a frontier LLM (GPT-5.2), and professional-grade local NMT systems (OPUS-CAT, NeuralDesktop, Promt). Automatic evaluation is conducted with MATEO. Results reveal substantial variation in local LLM performance across language directions and model sizes. The best local LLMs match or surpass local NMT systems and a frontier LLM, though they remain behind top commercial NMTs. These findings underscore the viability of carefully selected local LLM translation for privacy-constrained professionals and inform future research on model scaling and multilingual capability.

View on arXiv PDF

Similar