IRCLJul 6, 2024

Large language models are good medical coders, if provided with tools

arXiv:2407.12849v110 citationsh-index: 2
Originality Incremental advance
AI Analysis

This addresses the problem of improving efficiency and accuracy in medical coding for healthcare systems, though it is incremental as it builds on existing retrieval-based approaches with simplified inputs.

The study tackled automated ICD-10-CM medical coding by comparing a novel Retrieve-Rank system against a Vanilla LLM approach, finding that the Retrieve-Rank system achieved 100% accuracy on a dataset of 100 single-term medical conditions while the Vanilla LLM achieved only 6% accuracy.

This study presents a novel two-stage Retrieve-Rank system for automated ICD-10-CM medical coding, comparing its performance against a Vanilla Large Language Model (LLM) approach. Evaluating both systems on a dataset of 100 single-term medical conditions, the Retrieve-Rank system achieved 100% accuracy in predicting correct ICD-10-CM codes, significantly outperforming the Vanilla LLM (GPT-3.5-turbo), which achieved only 6% accuracy. Our analysis demonstrates the Retrieve-Rank system's superior precision in handling various medical terms across different specialties. While these results are promising, we acknowledge the limitations of using simplified inputs and the need for further testing on more complex, realistic medical cases. This research contributes to the ongoing effort to improve the efficiency and accuracy of medical coding, highlighting the importance of retrieval-based approaches.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes