CLAIApr 14, 2023

MedAlpaca -- An Open-Source Collection of Medical Conversational AI Models and Training Data

arXiv:2304.08247v3450 citationsh-index: 43Has Code
Originality Synthesis-oriented
AI Analysis

This addresses the problem of patient privacy and accessibility in medical AI by providing deployable open-source models for healthcare professionals.

The authors tackled the need for open-source medical conversational AI by creating a dataset of over 160,000 entries to fine-tune LLMs, and found that fine-tuned models outperformed pre-trained-only models on medical certification exams.

As large language models (LLMs) like OpenAI's GPT series continue to make strides, we witness the emergence of artificial intelligence applications in an ever-expanding range of fields. In medicine, these LLMs hold considerable promise for improving medical workflows, diagnostics, patient care, and education. Yet, there is an urgent need for open-source models that can be deployed on-premises to safeguard patient privacy. In our work, we present an innovative dataset consisting of over 160,000 entries, specifically crafted to fine-tune LLMs for effective medical applications. We investigate the impact of fine-tuning these datasets on publicly accessible pre-trained LLMs, and subsequently, we juxtapose the performance of pre-trained-only models against the fine-tuned models concerning the examinations that future medical doctors must pass to achieve certification.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes