A Multi-Layered Large Language Model Framework for Disease Prediction
This work addresses disease prediction for Arabic-speaking users in social telehealth, but it is incremental as it applies existing methods to a specific domain.
The study tackled disease classification and symptom severity assessment from Arabic medical text on social media by evaluating preprocessing techniques and BERT models, achieving 83% accuracy for type classification and 69% for severity assessment with CAMeL-BERT and NER.
Social telehealth has revolutionized healthcare by enabling patients to share symptoms and receive medical consultations remotely. Users frequently post symptoms on social media and online health platforms, generating a vast repository of medical data that can be leveraged for disease classification and symptom severity assessment. Large language models (LLMs), such as LLAMA3, GPT-3.5 Turbo, and BERT, process complex medical data to enhance disease classification. This study explores three Arabic medical text preprocessing techniques: text summarization, text refinement, and Named Entity Recognition (NER). Evaluating CAMeL-BERT, AraBERT, and Asafaya-BERT with LoRA, the best performance was achieved using CAMeL-BERT with NER-augmented text (83% type classification, 69% severity assessment). Non-fine-tuned models performed poorly (13%-20% type classification, 40%-49% severity assessment). Integrating LLMs into social telehealth systems enhances diagnostic accuracy and treatment outcomes.