CL AI IRMay 24, 2023

Neural Summarization of Electronic Health Records

Koyena Pal, Seyed Ali Bahrainian, Laura Mercurio, Carsten Eickhoff

arXiv:2305.15222v11.35 citations

Originality Synthesis-oriented

AI Analysis

This addresses the time-consuming task of writing discharge documentation for medical practitioners, but it is incremental as it applies existing summarization methods to a specific healthcare dataset.

This study tackled the problem of automatically generating hospital discharge summaries from nursing notes using neural network models, finding that fine-tuned models like BART and FLAN-T5 improved ROUGE F1 scores by up to 80% relative improvement, with FLAN-T5 achieving the highest score of 45.6.

Hospital discharge documentation is among the most essential, yet time-consuming documents written by medical practitioners. The objective of this study was to automatically generate hospital discharge summaries using neural network summarization models. We studied various data preparation and neural network training techniques that generate discharge summaries. Using nursing notes and discharge summaries from the MIMIC-III dataset, we studied the viability of the automatic generation of various sections of a discharge summary using four state-of-the-art neural network summarization models (BART, T5, Longformer and FLAN-T5). Our experiments indicated that training environments including nursing notes as the source, and discrete sections of the discharge summary as the target output (e.g. "History of Present Illness") improve language model efficiency and text quality. According to our findings, the fine-tuned BART model improved its ROUGE F1 score by 43.6% against its standard off-the-shelf version. We also found that fine-tuning the baseline BART model with other setups caused different degrees of improvement (up to 80% relative improvement). We also observed that a fine-tuned T5 generally achieves higher ROUGE F1 scores than other fine-tuned models and a fine-tuned FLAN-T5 achieves the highest ROUGE score overall, i.e., 45.6. For majority of the fine-tuned language models, summarizing discharge summary report sections separately outperformed the summarization the entire report quantitatively. On the other hand, fine-tuning language models that were previously instruction fine-tuned showed better performance in summarizing entire reports. This study concludes that a focused dataset designed for the automatic generation of discharge summaries by a language model can produce coherent Discharge Summary sections.

View on arXiv PDF

Similar