CLJun 17, 2025

From Chat to Checkup: Can Large Language Models Assist in Diabetes Prediction?

Shadman Sakib, Oishy Fatema Akhand, Ajwad Abrar

arXiv:2506.14949v14.91 citationsh-index: 12Has Code2025 International Conference on Quantum Photonics, Artificial Intelligence, and Networking (QPAIN)

Originality Incremental advance

AI Analysis

It addresses the problem of applying LLMs to structured medical data for diabetes prediction, which could assist healthcare professionals, but it is incremental as it builds on existing methods with new model comparisons.

This study tested the effectiveness of large language models (LLMs) in predicting diabetes using zero-shot, one-shot, and three-shot prompting on the Pima Indian Diabetes Database, finding that proprietary LLMs like GPT-4o and Gemma-2-27B achieved the highest accuracy in few-shot settings, with Gemma-2-27B outperforming traditional machine learning models in F1-score.

While Machine Learning (ML) and Deep Learning (DL) models have been widely used for diabetes prediction, the use of Large Language Models (LLMs) for structured numerical data is still not well explored. In this study, we test the effectiveness of LLMs in predicting diabetes using zero-shot, one-shot, and three-shot prompting methods. We conduct an empirical analysis using the Pima Indian Diabetes Database (PIDD). We evaluate six LLMs, including four open-source models: Gemma-2-27B, Mistral-7B, Llama-3.1-8B, and Llama-3.2-2B. We also test two proprietary models: GPT-4o and Gemini Flash 2.0. In addition, we compare their performance with three traditional machine learning models: Random Forest, Logistic Regression, and Support Vector Machine (SVM). We use accuracy, precision, recall, and F1-score as evaluation metrics. Our results show that proprietary LLMs perform better than open-source ones, with GPT-4o and Gemma-2-27B achieving the highest accuracy in few-shot settings. Notably, Gemma-2-27B also outperforms the traditional ML models in terms of F1-score. However, there are still issues such as performance variation across prompting strategies and the need for domain-specific fine-tuning. This study shows that LLMs can be useful for medical prediction tasks and encourages future work on prompt engineering and hybrid approaches to improve healthcare predictions.

View on arXiv PDF

Similar