Classifying Cool Dwarfs: Comprehensive Spectral Typing of Field and Peculiar Dwarfs Using Machine Learning
This work addresses the need for efficient classification in large spectroscopic surveys like Gaia and SDSS, but it is incremental as it applies existing ML methods to a specific astronomical domain.
The paper tackled the problem of automating spectral type classification for low-mass stars and brown dwarfs using machine learning on low-resolution near-infrared spectra, achieving 95.5% accuracy within ±1 spectral type and 89.5% accuracy for gravity and metallicity subclasses with a KNN model.
Low-mass stars and brown dwarfs -- spectral types (SpTs) M0 and later -- play a significant role in studying stellar and substellar processes and demographics, reaching down to planetary-mass objects. Currently, the classification of these sources remains heavily reliant on visual inspection of spectral features, equivalent width measurements, or narrow-/wide-band spectral indices. Recent advances in machine learning (ML) methods offer automated approaches for spectral typing, which are becoming increasingly important as large spectroscopic surveys such as Gaia, SDSS, and SPHEREx generate datasets containing millions of spectra. We investigate the application of ML in spectral type classification on low-resolution (R $\sim$ 120) near-infrared spectra of M0--T9 dwarfs obtained with the SpeX instrument on the NASA Infrared Telescope Facility. We specifically aim to classify the gravity- and metallicity-dependent subclasses for late-type dwarfs. We used binned fluxes as input features and compared the efficacy of spectral type estimators built using Random Forest (RF), Support Vector Machine (SVM), and K-Nearest Neighbor (KNN) models. We tested the influence of different normalizations and analyzed the relative importance of different spectral regions for surface gravity and metallicity subclass classification. Our best-performing model (using KNN) classifies 95.5 $\pm$ 0.6% of sources to within $\pm$1 SpT, and assigns surface gravity and metallicity subclasses with 89.5 $\pm$ 0.9% accuracy. We test the dependence of signal-to-noise ratio on classification accuracy and find sources with SNR $\gtrsim$ 60 have $\gtrsim$ 95% accuracy. We also find that zy-band plays the most prominent role in the RF model, with FeH and TiO having the highest feature importance.