IR CLAug 3, 2023

Evaluating ChatGPT text-mining of clinical records for obesity monitoring

Ivo S. Fins, Heather Davies, Sean Farrell, Jose R. Torres, Gina Pinchbeck, Alan D. Radford, Peter-John Noble

arXiv:2308.01666v11 citationsh-index: 38

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of monitoring obesity in veterinary medicine by evaluating AI tools for text-mining, though it is incremental as it compares existing methods on specific data.

The study compared ChatGPT and a regular expression method (RegexT) for extracting overweight body condition scores from veterinary clinical records, finding that ChatGPT had higher recall (100% vs. 72.6%) but lower precision (89.3% vs. 100%).

Background: Veterinary clinical narratives remain a largely untapped resource for addressing complex diseases. Here we compare the ability of a large language model (ChatGPT) and a previously developed regular expression (RegexT) to identify overweight body condition scores (BCS) in veterinary narratives. Methods: BCS values were extracted from 4,415 anonymised clinical narratives using either RegexT or by appending the narrative to a prompt sent to ChatGPT coercing the model to return the BCS information. Data were manually reviewed for comparison. Results: The precision of RegexT was higher (100%, 95% CI 94.81-100%) than the ChatGPT (89.3%; 95% CI82.75-93.64%). However, the recall of ChatGPT (100%. 95% CI 96.18-100%) was considerably higher than that of RegexT (72.6%, 95% CI 63.92-79.94%). Limitations: Subtle prompt engineering is needed to improve ChatGPT output. Conclusions: Large language models create diverse opportunities and, whilst complex, present an intuitive interface to information but require careful implementation to avoid unpredictable errors.

View on arXiv PDF

Similar