CL AISep 30, 2024

Adapting LLMs for the Medical Domain in Portuguese: A Study on Fine-Tuning and Model Evaluation

Pedro Henrique Paiola, Gabriel Lino Garcia, João Renato Ribeiro Manesco, Mateus Roder, Douglas Rodrigues, João Paulo Papa

arXiv:2410.00163v13.47 citationsh-index: 17

Originality Synthesis-oriented

AI Analysis

This work addresses the need for reliable medical virtual assistants for healthcare professionals in Portuguese, but the findings are incremental due to catastrophic forgetting and low inter-rater agreement.

This study evaluated large language models (LLMs) as medical agents in Portuguese, fine-tuning the ChatBode-7B model with translated datasets. The InternLM2 model, pre-trained on medical data, showed the best overall performance in accuracy, completeness, and safety, while fine-tuned DrBode models suffered from catastrophic forgetting.

This study evaluates the performance of large language models (LLMs) as medical agents in Portuguese, aiming to develop a reliable and relevant virtual assistant for healthcare professionals. The HealthCareMagic-100k-en and MedQuAD datasets, translated from English using GPT-3.5, were used to fine-tune the ChatBode-7B model using the PEFT-QLoRA method. The InternLM2 model, with initial training on medical data, presented the best overall performance, with high precision and adequacy in metrics such as accuracy, completeness and safety. However, DrBode models, derived from ChatBode, exhibited a phenomenon of catastrophic forgetting of acquired medical knowledge. Despite this, these models performed frequently or even better in aspects such as grammaticality and coherence. A significant challenge was low inter-rater agreement, highlighting the need for more robust assessment protocols. This work paves the way for future research, such as evaluating multilingual models specific to the medical field, improving the quality of training data, and developing more consistent evaluation methodologies for the medical field.

View on arXiv PDF

Similar