LG IMAug 30, 2021

Noisy Labels for Weakly Supervised Gamma Hadron Classification

Lukas Pfahler, Mirko Bunse, Katharina Morik

arXiv:2108.13396v13.11 citations

Originality Incremental advance

AI Analysis

This addresses the problem of expensive data annotation for astronomers by enabling effective classification without simulated labels, though it is incremental as it builds on weak supervision methods.

The paper tackles gamma hadron classification in astronomy by using a noisy label approach with unlabeled telescope data instead of costly simulated ground-truth labels, achieving state-of-the-art results and competitive performance on imbalanced datasets from other domains.

Gamma hadron classification, a central machine learning task in gamma ray astronomy, is conventionally tackled with supervised learning. However, the supervised approach requires annotated training data to be produced in sophisticated and costly simulations. We propose to instead solve gamma hadron classification with a noisy label approach that only uses unlabeled data recorded by the real telescope. To this end, we employ the significance of detection as a learning criterion which addresses this form of weak supervision. We show that models which are based on the significance of detection deliver state-of-the-art results, despite being exclusively trained with noisy labels; put differently, our models do not require the costly simulated ground-truth labels that astronomers otherwise employ for classifier training. Our weakly supervised models exhibit competitive performances also on imbalanced data sets that stem from a variety of other application domains. In contrast to existing work on class-conditional label noise, we assume that only one of the class-wise noise rates is known.

View on arXiv PDF

Similar