Translating Under Pressure: Domain-Aware LLMs for Crisis Communication
For crisis responders and affected populations, this provides a scalable translation solution when full multilingual coverage is infeasible.
The paper addresses the lack of parallel data for crisis communication translation by proposing a domain-adaptive pipeline that expands a small reference corpus, fine-tunes a small language model, and applies preference optimization for simplified English. Results show improved readability while maintaining adequacy, offering a practical lingua franca for emergencies.
Timely and reliable multilingual communication is critical during natural and human-induced disasters, but developing effective solutions for crisis communication is limited by the scarcity of curated parallel data. We propose a domain-adaptive pipeline that expands a small reference corpus, by retrieving and filtering data from general corpora. We use the resulting dataset to fine-tune a small language model for crisis-domain translation and then apply preference optimization to bias outputs toward CEFR A2-level English. Automatic and human evaluation shows that this approach improves readability, while maintaining strong adequacy. Our results indicate that simplified English, combined with domain adaptation, can function as a practical lingua franca for emergency communication when full multilingual coverage is not feasible.