IRCLAug 3, 2023

Evaluating ChatGPT text-mining of clinical records for obesity monitoring

arXiv:2308.01666v11 citationsh-index: 38
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of monitoring obesity in veterinary medicine by evaluating AI tools for text-mining, though it is incremental as it compares existing methods on specific data.

The study compared ChatGPT and a regular expression method (RegexT) for extracting overweight body condition scores from veterinary clinical records, finding that ChatGPT had higher recall (100% vs. 72.6%) but lower precision (89.3% vs. 100%).

Background: Veterinary clinical narratives remain a largely untapped resource for addressing complex diseases. Here we compare the ability of a large language model (ChatGPT) and a previously developed regular expression (RegexT) to identify overweight body condition scores (BCS) in veterinary narratives. Methods: BCS values were extracted from 4,415 anonymised clinical narratives using either RegexT or by appending the narrative to a prompt sent to ChatGPT coercing the model to return the BCS information. Data were manually reviewed for comparison. Results: The precision of RegexT was higher (100%, 95% CI 94.81-100%) than the ChatGPT (89.3%; 95% CI82.75-93.64%). However, the recall of ChatGPT (100%. 95% CI 96.18-100%) was considerably higher than that of RegexT (72.6%, 95% CI 63.92-79.94%). Limitations: Subtle prompt engineering is needed to improve ChatGPT output. Conclusions: Large language models create diverse opportunities and, whilst complex, present an intuitive interface to information but require careful implementation to avoid unpredictable errors.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes