Fusing Multi- and Hyperspectral Satellite Data for Harmful Algal Bloom Monitoring with Self-Supervised and Hierarchical Deep Learning
This work advances scalable HAB monitoring for environmental agencies and researchers in label-scarce aquatic environments, though it is incremental in applying self-supervised learning to this domain.
The paper tackled the problem of detecting and mapping harmful algal blooms (HABs) using satellite data without labeled datasets, achieving strong agreement with in-situ measurements for phytoplankton and specific species like Karenia brevis.
We present a self-supervised machine learning framework for detecting and mapping harmful algal bloom (HAB) severity and speciation using multi-sensor satellite data. By fusing reflectance data from operational instruments (VIIRS, MODIS, Sentinel-3, PACE) with TROPOMI solar-induced fluorescence (SIF), our framework, called SIT-FUSE, generates HAB severity and speciation products without requiring per-instrument labeled datasets. The framework employs self-supervised representation learning, hierarchical deep clustering to segment phytoplankton concentrations and speciations into interpretable classes, validated against in-situ data from the Gulf of Mexico and Southern California (2018-2025). Results show strong agreement with total phytoplankton, Karenia brevis, Alexandrium spp., and Pseudo-nitzschia spp. measurements. This work advances scalable HAB monitoring in label-scarce environments while enabling exploratory analysis via hierarchical embeddings: a critical step toward operationalizing self-supervised learning for global aquatic biogeochemistry.