Fine-tuning Large Language Models for Automated Diagnostic Screening Summaries
This work addresses the need for scalable mental health support in developing countries by automating diagnostic screenings, but it is incremental as it applies fine-tuning to existing LLMs on a custom dataset.
The paper tackles the problem of generating concise summaries from mental state examinations for automated diagnostic screening in mental health, and the result is that their fine-tuned model outperforms existing models with ROUGE-1 and ROUGE-L scores of 0.810 and 0.764, showing promising generalizability on a public dataset.
Improving mental health support in developing countries is a pressing need. One potential solution is the development of scalable, automated systems to conduct diagnostic screenings, which could help alleviate the burden on mental health professionals. In this work, we evaluate several state-of-the-art Large Language Models (LLMs), with and without fine-tuning, on our custom dataset for generating concise summaries from mental state examinations. We rigorously evaluate four different models for summary generation using established ROUGE metrics and input from human evaluators. The results highlight that our top-performing fine-tuned model outperforms existing models, achieving ROUGE-1 and ROUGE-L values of 0.810 and 0.764, respectively. Furthermore, we assessed the fine-tuned model's generalizability on a publicly available D4 dataset, and the outcomes were promising, indicating its potential applicability beyond our custom dataset.