Machine Learning Pipeline for Pulsar Star Dataset
This work addresses pulsar identification for astronomy, but it is incremental as it applies existing methods to a known dataset.
The study compared common machine learning algorithms on the unbalanced HTRU2 pulsar dataset of 17,000 observations, finding that standard algorithms can achieve promising accuracy ratios despite noise and class imbalance.
This work brings together some of the most common machine learning (ML) algorithms, and the objective is to make a comparison at the level of obtained results from a set of unbalanced data. This dataset is composed of almost 17 thousand observations made to astronomical objects to identify pulsars (HTRU2). The methodological proposal based on evaluating the accuracy of these different models on the same database treated with two different strategies for unbalanced data. The results show that in spite of the noise and unbalance of classes present in this type of data, it is possible to apply them on standard ML algorithms and obtain promising accuracy ratios.