Classification of kinetic-related injury in hospital triage data using NLP
This work addresses the challenge of applying NLP to sensitive hospital triage data for medical staff and researchers, but it is incremental as it builds on existing methods with minor adaptations.
The paper tackled the problem of classifying kinetic-related injury in hospital triage data using NLP by developing a pipeline that fine-tunes a pre-trained LLM with a small open-source dataset on a GPU and further fine-tunes it with a hospital-specific dataset on a CPU, achieving successful classification with limited compute resources.
Triage notes, created at the start of a patient's hospital visit, contain a wealth of information that can help medical staff and researchers understand Emergency Department patient epidemiology and the degree of time-dependent illness or injury. Unfortunately, applying modern Natural Language Processing and Machine Learning techniques to analyse triage data faces some challenges: Firstly, hospital data contains highly sensitive information that is subject to privacy regulation thus need to be analysed on site; Secondly, most hospitals and medical facilities lack the necessary hardware to fine-tune a Large Language Model (LLM), much less training one from scratch; Lastly, to identify the records of interest, expert inputs are needed to manually label the datasets, which can be time-consuming and costly. We present in this paper a pipeline that enables the classification of triage data using LLM and limited compute resources. We first fine-tuned a pre-trained LLM with a classifier using a small (2k) open sourced dataset on a GPU; and then further fine-tuned the model with a hospital specific dataset of 1000 samples on a CPU. We demonstrated that by carefully curating the datasets and leveraging existing models and open sourced data, we can successfully classify triage data with limited compute resources.