Instruction-tuned Large Language Models for Machine Translation in the Medical Domain
This work addresses the need for consistent terminology translation in the medical domain for users, researchers, and translators, but it is incremental as it builds on existing instruction-tuning methods.
The study tackled the problem of low performance of large language models (LLMs) in machine translation for the medical domain by comparing baseline LLMs with instruction-tuned LLMs and incorporating medical terminology into fine-tuning, resulting in instruction-tuned LLMs significantly outperforming baseline models in automatic metrics.
Large Language Models (LLMs) have shown promising results on machine translation for high resource language pairs and domains. However, in specialised domains (e.g. medical) LLMs have shown lower performance compared to standard neural machine translation models. The consistency in the machine translation of terminology is crucial for users, researchers, and translators in specialised domains. In this study, we compare the performance between baseline LLMs and instruction-tuned LLMs in the medical domain. In addition, we introduce terminology from specialised medical dictionaries into the instruction formatted datasets for fine-tuning LLMs. The instruction-tuned LLMs significantly outperform the baseline models with automatic metrics.