LGAICLFeb 24, 2023

Modelling Temporal Document Sequences for Clinical ICD Coding

arXiv:2302.12666v1272 citationsh-index: 25
Originality Incremental advance
AI Analysis

This addresses the ICD coding problem for clinical documentation, offering incremental improvements by leveraging more data.

The paper tackles the problem of ICD coding by using all clinical notes from a hospital stay, rather than just the discharge summary, and proposes a hierarchical transformer architecture that incorporates metadata embeddings. The model outperforms prior state-of-the-art methods when using only discharge summaries and achieves further improvements with all notes.

Past studies on the ICD coding problem focus on predicting clinical codes primarily based on the discharge summary. This covers only a small fraction of the notes generated during each hospital stay and leaves potential for improving performance by analysing all the available clinical notes. We propose a hierarchical transformer architecture that uses text across the entire sequence of clinical notes in each hospital stay for ICD coding, and incorporates embeddings for text metadata such as their position, time, and type of note. While using all clinical notes increases the quantity of data substantially, superconvergence can be used to reduce training costs. We evaluate the model on the MIMIC-III dataset. Our model exceeds the prior state-of-the-art when using only discharge summaries as input, and achieves further performance improvements when all clinical notes are used as input.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes