CLAug 16, 2023

Large Language Models for Granularized Barrett's Esophagus Diagnosis Classification

Jenna Kefeli, Ali Soroush, Courtney J. Diamond, Haley M. Zylberberg, Benjamin May, Julian A. Abrams, Chunhua Weng, Nicholas Tatonetti

arXiv:2308.08660v10.93 citationsh-index: 48

Originality Incremental advance

AI Analysis

This work addresses the lack of granularity in diagnostic codes for Barrett's esophagus, a precursor to esophageal cancer, by automating data extraction for research and clinical use, though it is incremental as it compares to existing rule-based systems.

The researchers tackled the problem of extracting granular diagnostic phenotypes from Barrett's esophagus pathology reports by developing a transformer-based method, achieving an F1-score of 0.964 for binary dysplasia classification and 0.911 for multi-class diagnosis classification.

Diagnostic codes for Barrett's esophagus (BE), a precursor to esophageal cancer, lack granularity and precision for many research or clinical use cases. Laborious manual chart review is required to extract key diagnostic phenotypes from BE pathology reports. We developed a generalizable transformer-based method to automate data extraction. Using pathology reports from Columbia University Irving Medical Center with gastroenterologist-annotated targets, we performed binary dysplasia classification as well as granularized multi-class BE-related diagnosis classification. We utilized two clinically pre-trained large language models, with best model performance comparable to a highly tailored rule-based system developed using the same data. Binary dysplasia extraction achieves 0.964 F1-score, while the multi-class model achieves 0.911 F1-score. Our method is generalizable and faster to implement as compared to a tailored rule-based approach.

View on arXiv PDF

Similar