Visibility nowcasting in South Korea: a machine learning approach to class imbalance and distribution shift
For operational nowcasting in transportation and air quality, this work highlights the need to account for distribution shifts, but the approach is incremental and the performance degradation is a negative result.
This study develops a machine learning framework for visibility nowcasting in six South Korean cities, addressing class imbalance with SMOTENC and CTGAN and using an ensemble of models. Results showed a marked decline in test performance due to temporal distribution shift, confirmed by Wasserstein distance on the most influential SHAP feature.
Atmospheric visibility is a critical variable for transportation safety and air quality management, however, accurate prediction remains challenging due to the complex interactions between meteorological conditions and air pollutants, as well as the rarity of low-visibility events. This study introduces a machine learning framework to nowcast visibility in six major South Korean cities. To handle the imbalance in the 2018-2020 training data, we applied the Synthetic Minority Over-sampling Technique with Nominal and Continuous (SMOTENC) and Conditional Tabular Generative Adversarial Network (CTGAN). An ensemble approach combining machine learning and deep learning models was then used and evaluated on a 2021 test dataset. The results revealed a marked decline in predictive performance in the test set compared to the cross-validation phase. This degradation was attributed to a distributional shift between training and testing periods, which was quantitatively confirmed by measuring the Wasserstein distance of the most influential feature identified by SHAP analysis. In general, this study presents a methodology that aims to simultaneously address the dual challenges of data imbalance and temporal distributional shifts, and emphasizes the necessity of accounting for evolving external environmental factors when implementing nowcasting models on time-series data.