CLApr 17, 2021

Three-level Hierarchical Transformer Networks for Long-sequence and Multiple Clinical Documents Classification

arXiv:2104.08444v22.012 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the challenge of handling long sequences in clinical document classification for healthcare applications, though it is incremental as it builds on existing Transformer architectures.

The authors tackled the problem of modeling long-term dependencies across clinical notes for patient-level prediction by proposing a Three-level Hierarchical Transformer Network (3-level-HTN), which outperformed previous state-of-the-art models like BigBird on the MIMIC-III dataset.

We present a Three-level Hierarchical Transformer Network (3-level-HTN) for modeling long-term dependencies across clinical notes for the purpose of patient-level prediction. The network is equipped with three levels of Transformer-based encoders to learn progressively from words to sentences, sentences to notes, and finally notes to patients. The first level from word to sentence directly applies a pre-trained BERT model as a fully trainable component. While the second and third levels both implement a stack of transformer-based encoders, before the final patient representation is fed into a classification layer for clinical predictions. Compared to conventional BERT models, our model increases the maximum input length from 512 tokens to much longer sequences that are appropriate for modeling large numbers of clinical notes. We empirically examine different hyper-parameters to identify an optimal trade-off given computational resource limits. Our experiment results on the MIMIC-III dataset for different prediction tasks demonstrate that the proposed Hierarchical Transformer Network outperforms previous state-of-the-art models, including but not limited to BigBird.

View on arXiv PDF Code

Similar