CLNov 28, 2023

Text2Tree: Aligning Text Representation to the Label Tree Hierarchy for Imbalanced Medical Classification

arXiv:2311.16650v1131 citationsh-index: 18
Originality Highly original
AI Analysis

This addresses the challenge of imbalanced classification in medical texts for healthcare applications, offering a novel approach that leverages internal label hierarchy without external data.

The paper tackles the problem of imbalanced and scarce medical text classification by proposing Text2Tree, a framework-agnostic algorithm that aligns text representation with label hierarchy, achieving superior performance over existing methods on public datasets and real-world medical records.

Deep learning approaches exhibit promising performances on various text tasks. However, they are still struggling on medical text classification since samples are often extremely imbalanced and scarce. Different from existing mainstream approaches that focus on supplementary semantics with external medical information, this paper aims to rethink the data challenges in medical texts and present a novel framework-agnostic algorithm called Text2Tree that only utilizes internal label hierarchy in training deep learning models. We embed the ICD code tree structure of labels into cascade attention modules for learning hierarchy-aware label representations. Two new learning schemes, Similarity Surrogate Learning (SSL) and Dissimilarity Mixup Learning (DML), are devised to boost text classification by reusing and distinguishing samples of other labels following the label representation hierarchy, respectively. Experiments on authoritative public datasets and real-world medical records show that our approach stably achieves superior performances over classical and advanced imbalanced classification methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes