Lightweight Baselines for Medical Abstract Classification: DistilBERT with Cross-Entropy as a Strong Default
This work addresses efficient medical text classification for resource-limited settings, offering incremental improvements through optimized training and calibration techniques.
The study evaluated lightweight methods for medical abstract classification under budget constraints, finding that DistilBERT with cross-entropy loss provided the strongest performance, with post-hoc calibration further improving deployed metrics like Macro F1.
The research evaluates lightweight medical abstract classification methods to establish their maximum performance capabilities under financial budget restrictions. On the public medical abstracts corpus, we finetune BERT base and Distil BERT with three objectives cross entropy (CE), class weighted CE, and focal loss under identical tokenization, sequence length, optimizer, and schedule. DistilBERT with plain CE gives the strongest raw argmax trade off, while a post hoc operating point selection (validation calibrated, classwise thresholds) sub stantially improves deployed performance; under this tuned regime, focal benefits most. We report Accuracy, Macro F1, and WeightedF1, release evaluation artifacts, and include confusion analyses to clarify error structure. The practical takeaway is to start with a compact encoder and CE, then add lightweight calibration or thresholding when deployment requires higher macro balance.