SRAug 29, 2023
Efficient labeling of solar flux evolution videos by a deep learning modelSubhamoy Chatterjee, Andrés Muñoz-Jaramillo, Derek A. Lamb
Machine learning (ML) is becoming a critical tool for interrogation of large complex data. Labeling, defined as the process of adding meaningful annotations, is a crucial step of supervised ML. However, labeling datasets is time consuming. Here we show that convolutional neural networks (CNNs), trained on crudely labeled astronomical videos, can be leveraged to improve the quality of data labeling and reduce the need for human intervention. We use videos of the solar magnetic field, crudely labeled into two classes: emergence or non-emergence of bipolar magnetic regions (BMRs), based on their first detection on the solar disk. We train CNNs using crude labels, manually verify, correct labeling vs. CNN disagreements, and repeat this process until convergence. Traditionally, flux emergence labelling is done manually. We find that a high-quality labeled dataset, derived through this iterative process, reduces the necessary manual verification by 50%. Furthermore, by gradually masking the videos and looking for maximum change in CNN inference, we locate BMR emergence time without retraining the CNN. This demonstrates the versatility of CNNs for simplifying the challenging task of labeling complex dynamic events.
SRAug 29, 2023
Probabilistic solar flare forecasting using historical magnetogram dataKiera van der Sande, Andrés Muñoz-Jaramillo, Subhamoy Chatterjee
Solar flare forecasting research using machine learning (ML) has focused on high resolution magnetogram data from the SDO/HMI era covering Solar Cycle 24 and the start of Solar Cycle 25, with some efforts looking back to SOHO/MDI for data from Solar Cycle 23. In this paper, we consider over 4 solar cycles of daily historical magnetogram data from multiple instruments. This is the first attempt to take advantage of this historical data for ML-based flare forecasting. We apply a convolutional neural network (CNN) to extract features from full-disk magnetograms together with a logistic regression model to incorporate scalar features based on magnetograms and flaring history. We use an ensemble approach to generate calibrated probabilistic forecasts of M-class or larger flares in the next 24 hours. Overall, we find that including historical data improves forecasting skill and reliability. We show that single frame magnetograms do not contain significantly more relevant information than can be summarized in a small number of scalar features, and that flaring history has greater predictive power than our CNN-extracted features. This indicates the importance of including temporal information in flare forecasting models.
SRAug 20, 2023
Homogenising SoHO/EIT and SDO/AIA 171Å$~$ Images: A Deep Learning ApproachSubhamoy Chatterjee, Andrés Muñoz-Jaramillo, Maher Dayeh et al.
Extreme Ultraviolet images of the Sun are becoming an integral part of space weather prediction tasks. However, having different surveys requires the development of instrument-specific prediction algorithms. As an alternative, it is possible to combine multiple surveys to create a homogeneous dataset. In this study, we utilize the temporal overlap of SoHO/EIT and SDO/AIA 171~Å~surveys to train an ensemble of deep learning models for creating a single homogeneous survey of EUV images for 2 solar cycles. Prior applications of deep learning have focused on validating the homogeneity of the output while overlooking the systematic estimation of uncertainty. We use an approach called `Approximate Bayesian Ensembling' to generate an ensemble of models whose uncertainty mimics that of a fully Bayesian neural network at a fraction of the cost. We find that ensemble uncertainty goes down as the training set size increases. Additionally, we show that the model ensemble adds immense value to the prediction by showing higher uncertainty in test data that are not well represented in the training data.
SRAug 18, 2025
Surya: Foundation Model for HeliophysicsSujit Roy, Johannes Schmude, Rohit Lal et al.
Heliophysics is central to understanding and forecasting space weather events and solar activity. Despite decades of high-resolution observations from the Solar Dynamics Observatory (SDO), most models remain task-specific and constrained by scarce labeled data, limiting their capacity to generalize across solar phenomena. We introduce Surya, a 366M parameter foundation model for heliophysics designed to learn general-purpose solar representations from multi-instrument SDO observations, including eight Atmospheric Imaging Assembly (AIA) channels and five Helioseismic and Magnetic Imager (HMI) products. Surya employs a spatiotemporal transformer architecture with spectral gating and long--short range attention, pretrained on high-resolution solar image forecasting tasks and further optimized through autoregressive rollout tuning. Zero-shot evaluations demonstrate its ability to forecast solar dynamics and flare events, while downstream fine-tuning with parameter-efficient Low-Rank Adaptation (LoRA) shows strong performance on solar wind forecasting, active region segmentation, solar flare forecasting, and EUV spectra. Surya is the first foundation model in heliophysics that uses time advancement as a pretext task on full-resolution SDO data. Its novel architecture and performance suggest that the model is able to learn the underlying physics behind solar evolution.
SRAug 18, 2025
SuryaBench: Benchmark Dataset for Advancing Machine Learning in Heliophysics and Space Weather PredictionSujit Roy, Dinesha V. Hegde, Johannes Schmude et al.
This paper introduces a high resolution, machine learning-ready heliophysics dataset derived from NASA's Solar Dynamics Observatory (SDO), specifically designed to advance machine learning (ML) applications in solar physics and space weather forecasting. The dataset includes processed imagery from the Atmospheric Imaging Assembly (AIA) and Helioseismic and Magnetic Imager (HMI), spanning a solar cycle from May 2010 to July 2024. To ensure suitability for ML tasks, the data has been preprocessed, including correction of spacecraft roll angles, orbital adjustments, exposure normalization, and degradation compensation. We also provide auxiliary application benchmark datasets complementing the core SDO dataset. These provide benchmark applications for central heliophysics and space weather tasks such as active region segmentation, active region emergence forecasting, coronal field extrapolation, solar flare prediction, solar EUV spectra prediction, and solar wind speed estimation. By establishing a unified, standardized data collection, this dataset aims to facilitate benchmarking, enhance reproducibility, and accelerate the development of AI-driven models for critical space weather prediction tasks, bridging gaps between solar physics, machine learning, and operational forecasting.
LGOct 23, 2025
CIPHER: Scalable Time Series Analysis for Physical Sciences with Application to Solar Wind PhenomenaJasmine R. Kobayashi, Daniela Martin, Valmir P Moraes Filho et al.
Labeling or classifying time series is a persistent challenge in the physical sciences, where expert annotations are scarce, costly, and often inconsistent. Yet robust labeling is essential to enable machine learning models for understanding, prediction, and forecasting. We present the \textit{Clustering and Indexation Pipeline with Human Evaluation for Recognition} (CIPHER), a framework designed to accelerate large-scale labeling of complex time series in physics. CIPHER integrates \textit{indexable Symbolic Aggregate approXimation} (iSAX) for interpretable compression and indexing, density-based clustering (HDBSCAN) to group recurring phenomena, and a human-in-the-loop step for efficient expert validation. Representative samples are labeled by domain scientists, and these annotations are propagated across clusters to yield systematic, scalable classifications. We evaluate CIPHER on the task of classifying solar wind phenomena in OMNI data, a central challenge in space weather research, showing that the framework recovers meaningful phenomena such as coronal mass ejections and stream interaction regions. Beyond this case study, CIPHER highlights a general strategy for combining symbolic representations, unsupervised learning, and expert knowledge to address label scarcity in time series across the physical sciences. The code and configuration files used in this study are publicly available to support reproducibility.
SRDec 2, 2020
RotNet: Fast and Scalable Estimation of Stellar Rotation Periods Using Convolutional Neural NetworksJ. Emmanuel Johnson, Sairam Sundaresan, Tansu Daylan et al.
Magnetic activity in stars manifests as dark spots on their surfaces that modulate the brightness observed by telescopes. These light curves contain important information on stellar rotation. However, the accurate estimation of rotation periods is computationally expensive due to scarce ground truth information, noisy data, and large parameter spaces that lead to degenerate solutions. We harness the power of deep learning and successfully apply Convolutional Neural Networks to regress stellar rotation periods from Kepler light curves. Geometry-preserving time-series to image transformations of the light curves serve as inputs to a ResNet-18 based architecture which is trained through transfer learning. The McQuillan catalog of published rotation periods is used as ansatz to groundtruth. We benchmark the performance of our method against a random forest regressor, a 1D CNN, and the Auto-Correlation Function (ACF) - the current standard to estimate rotation periods. Despite limiting our input to fewer data points (1k), our model yields more accurate results and runs 350 times faster than ACF runs on the same number of data points and 10,000 times faster than ACF runs on 65k data points. With only minimal feature engineering our approach has impressive accuracy, motivating the application of deep learning to regress stellar parameters on an even larger scale
SRNov 4, 2019
Single-Frame Super-Resolution of Solar Magnetograms: Investigating Physics-Based Metrics & LossesAnna Jungbluth, Xavier Gitiaux, Shane A. Maloney et al.
Breakthroughs in our understanding of physical phenomena have traditionally followed improvements in instrumentation. Studies of the magnetic field of the Sun, and its influence on the solar dynamo and space weather events, have benefited from improvements in resolution and measurement frequency of new instruments. However, in order to fully understand the solar cycle, high-quality data across time-scales longer than the typical lifespan of a solar instrument are required. At the moment, discrepancies between measurement surveys prevent the combined use of all available data. In this work, we show that machine learning can help bridge the gap between measurement surveys by learning to super-resolve low-resolution magnetic field images and translate between characteristics of contemporary instruments in orbit. We also introduce the notion of physics-based metrics and losses for super-resolution to preserve underlying physics and constrain the solution space of possible super-resolution outputs.
LGNov 4, 2019
Probabilistic Super-Resolution of Solar Magnetograms: Generating Many Explanations and Measuring UncertaintiesXavier Gitiaux, Shane A. Maloney, Anna Jungbluth et al.
Machine learning techniques have been successfully applied to super-resolution tasks on natural images where visually pleasing results are sufficient. However in many scientific domains this is not adequate and estimations of errors and uncertainties are crucial. To address this issue we propose a Bayesian framework that decomposes uncertainties into epistemic and aleatoric uncertainties. We test the validity of our approach by super-resolving images of the Sun's magnetic field and by generating maps measuring the range of possible high resolution explanations compatible with a given low resolution magnetogram.
SRMar 11, 2019
A Machine Learning Dataset Prepared From the NASA Solar Dynamics Observatory MissionRichard Galvez, David F. Fouhey, Meng Jin et al.
In this paper we present a curated dataset from the NASA Solar Dynamics Observatory (SDO) mission in a format suitable for machine learning research. Beginning from level 1 scientific products we have processed various instrumental corrections, downsampled to manageable spatial and temporal resolutions, and synchronized observations spatially and temporally. We illustrate the use of this dataset with two example applications: forecasting future EVE irradiance from present EVE irradiance and translating HMI observations into AIA observations. For each application we provide metrics and baselines for future model comparison. We anticipate this curated dataset will facilitate machine learning research in heliophysics and the physical sciences generally, increasing the scientific return of the SDO mission. This work is a direct result of the 2018 NASA Frontier Development Laboratory Program. Please see the appendix for access to the dataset.