Loss Design and Architecture Selection for Long-Tailed Multi-Label Chest X-Ray Classification
This work addresses the challenge of rare disease detection in clinical imaging for medical practitioners, but it is incremental as it focuses on empirical evaluation of existing methods.
The paper tackled long-tailed class distributions in multi-label chest X-ray classification by evaluating loss functions and architectures, finding that LDAM-DRW outperformed standard methods for rare classes and ConvNeXt-Large achieved 0.5220 mAP on a development set, with their submission ranking 5th on a test leaderboard with 0.3950 mAP.
Long-tailed class distributions pose a significant challenge for multi-label chest X-ray (CXR) classification, where rare but clinically important findings are severely underrepresented. In this work, we present a systematic empirical evaluation of loss functions, CNN backbone architectures and post-training strategies on the CXR-LT 2026 benchmark, comprising approximately 143K images with 30 disease labels from PadChest. Our experiments demonstrate that LDAM with deferred re-weighting (LDAM-DRW) consistently outperforms standard BCE and asymmetric losses for rare class recognition. Amongst the architectures evaluated, ConvNeXt-Large achieves the best single-model performance with 0.5220 mAP and 0.3765 F1 on our development set, whilst classifier re-training and test-time augmentation further improve ranking metrics. On the official test leaderboard, our submission achieved 0.3950 mAP, ranking 5th amongst all 68 participating teams with total of 1528 submissions. We provide a candid analysis of the development-to-test performance gap and discuss practical insights for handling class imbalance in clinical imaging settings. Code is available at https://github.com/Nikhil-Rao20/Long_Tail.