SR CVFeb 29, 2016

Clustering Based Feature Learning on Variable Stars

Cristóbal Mackenzie, Karim Pichara, Pavlos Protopapas

arXiv:1602.08977v128 citations

Originality Incremental advance

AI Analysis

This addresses the need for scalable and automated analysis pipelines for astronomers dealing with vast data from future surveys like LSST, though it is incremental as it builds on existing unsupervised learning methods applied to a new domain.

The paper tackles the problem of automatic classification of variable stars by developing an unsupervised feature learning algorithm that extracts and clusters lightcurve subsequences to find common patterns, achieving classification performance comparable or better than traditional expert-designed features on MACHO and OGLE datasets with significantly lower computational cost.

The success of automatic classification of variable stars strongly depends on the lightcurve representation. Usually, lightcurves are represented as a vector of many statistical descriptors designed by astronomers called features. These descriptors commonly demand significant computational power to calculate, require substantial research effort to develop and do not guarantee good performance on the final classification task. Today, lightcurve representation is not entirely automatic; algorithms that extract lightcurve features are designed by humans and must be manually tuned up for every survey. The vast amounts of data that will be generated in future surveys like LSST mean astronomers must develop analysis pipelines that are both scalable and automated. Recently, substantial efforts have been made in the machine learning community to develop methods that prescind from expert-designed and manually tuned features for features that are automatically learned from data. In this work we present what is, to our knowledge, the first unsupervised feature learning algorithm designed for variable stars. Our method first extracts a large number of lightcurve subsequences from a given set of photometric data, which are then clustered to find common local patterns in the time series. Representatives of these patterns, called exemplars, are then used to transform lightcurves of a labeled set into a new representation that can then be used to train an automatic classifier. The proposed algorithm learns the features from both labeled and unlabeled lightcurves, overcoming the bias generated when the learning process is done only with labeled data. We test our method on MACHO and OGLE datasets; the results show that the classification performance we achieve is as good and in some cases better than the performance achieved using traditional features, while the computational cost is significantly lower.

View on arXiv PDF

Similar