CLAIOct 11, 2025

Lightweight Baselines for Medical Abstract Classification: DistilBERT with Cross-Entropy as a Strong Default

arXiv:2510.10025v25 citationsh-index: 52025 8th International Conference on Machine Learning and Natural Language Processing (MLNLP)
Originality Synthesis-oriented
AI Analysis

This work addresses efficient medical text classification for resource-limited settings, offering incremental improvements through optimized training and calibration techniques.

The study evaluated lightweight methods for medical abstract classification under budget constraints, finding that DistilBERT with cross-entropy loss provided the strongest performance, with post-hoc calibration further improving deployed metrics like Macro F1.

The research evaluates lightweight medical abstract classification methods to establish their maximum performance capabilities under financial budget restrictions. On the public medical abstracts corpus, we finetune BERT base and Distil BERT with three objectives cross entropy (CE), class weighted CE, and focal loss under identical tokenization, sequence length, optimizer, and schedule. DistilBERT with plain CE gives the strongest raw argmax trade off, while a post hoc operating point selection (validation calibrated, classwise thresholds) sub stantially improves deployed performance; under this tuned regime, focal benefits most. We report Accuracy, Macro F1, and WeightedF1, release evaluation artifacts, and include confusion analyses to clarify error structure. The practical takeaway is to start with a compact encoder and CE, then add lightweight calibration or thresholding when deployment requires higher macro balance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes