CLJan 14

Efficient Multilingual Dialogue Processing via Translation Pipelines and Distilled Language Models

Santiago Martínez Novoa, Nicolás Rozo Fajardo, Diego Alejandro González Vargas, Nicolás Bedoya Figueroa

arXiv:2601.09059v1

Originality Synthesis-oriented

AI Analysis

This work addresses efficient processing for low-resource languages in health-related dialogues, though it is incremental as it builds on existing translation and distillation methods.

The paper tackled multilingual dialogue summarization and question answering for Indic languages by using a translation pipeline with a distilled language model, achieving strong performance such as 86.7% QnA on Marathi and Tamil.

This paper presents team Kl33n3x's multilingual dialogue summarization and question answering system developed for the NLPAI4Health 2025 shared task. The approach employs a three-stage pipeline: forward translation from Indic languages to English, multitask text generation using a 2.55B parameter distilled language model, and reverse translation back to source languages. By leveraging knowledge distillation techniques, this work demonstrates that compact models can achieve highly competitive performance across nine languages. The system achieved strong win rates across the competition's tasks, with particularly robust performance on Marathi (86.7% QnA), Tamil (86.7% QnA), and Hindi (80.0% QnA), demonstrating the effectiveness of translation-based approaches for low-resource language processing without task-specific fine-tuning.

View on arXiv PDF

Similar