Improving Solar Flare Prediction by Time Series Outlier Detection
This addresses the reliability of solar flare prediction models for space technology and infrastructure protection, but it is incremental as it focuses on outlier detection in an existing dataset.
The study tackled the problem of solar flare prediction by investigating the impact of outliers in a multivariate time series dataset, finding that removing outliers using Isolation Forest led to a 279% increase in True Skill Statistic and 68% increase in Heidke Skill Score.
Solar flares not only pose risks to outer space technologies and astronauts' well being, but also cause disruptions on earth to our hight-tech, interconnected infrastructure our lives highly depend on. While a number of machine-learning methods have been proposed to improve flare prediction, none of them, to the best of our knowledge, have investigated the impact of outliers on the reliability and those models' performance. In this study, we investigate the impact of outliers in a multivariate time series benchmark dataset, namely SWAN-SF, on flare prediction models, and test our hypothesis. That is, there exist outliers in SWAN-SF, removal of which enhances the performance of the prediction models on unseen datasets. We employ Isolation Forest to detect the outliers among the weaker flare instances. Several experiments are carried out using a large range of contamination rates which determine the percentage of present outliers. We asses the quality of each dataset in terms of its actual contamination using TimeSeriesSVC. In our best finding, we achieve a 279% increase in True Skill Statistic and 68% increase in Heidke Skill Score. The results show that overall a significant improvement can be achieved to flare prediction if outliers are detected and removed properly.