Satellite-Surface-Area Machine-Learning Models for Reservoir Storage Estimation: Regime-Sensitive Evaluation and Operational Deployment at Loskop Dam, South Africa
This provides more accurate daily storage estimates for water allocation and drought response in semiarid regions, addressing sedimentation and drawdown issues, but is incremental as it applies existing ML methods to a specific domain problem.
The study tackled the problem of unreliable reservoir storage estimation at Loskop Dam in South Africa by developing machine-learning models using satellite surface area data, achieving a Ridge regression model with a cross-validated RMSE of 12.3 million cubic meters and a stacked ensemble reducing error to about 11 million cubic meters (~3% of live capacity).
Reliable daily estimates of reservoir storage are pivotal for water allocation and drought response decisions in semiarid regions. Conventional rating curves at Loskop Dam, the primary storage on South Africa's Olifants River, have become increasingly uncertain owing to sedimentation and episodic drawdown. A 40 year Digital Earth Africa (DEA) surface area archive (1984-2024) fused with gauged water levels to develop data driven volume predictors that operate under a maximum 9.14%, a 90 day drawdown constraint. Four nested feature sets were examined: (i) raw water area, (ii) +a power law "calculated volume" proxy, (iii) +six river geometry metrics, and (iv) +full supply elevation. Five candidate algorithms, Gradient Boosting (GB), Random Forest (RF), Ridge (RI), Lasso (LA) and Elastic Net (EN), were tuned using a 20 draw random search and assessed with a five fold Timeseries Split to eliminate look ahead bias. Prediction errors were decomposed into two regimes: Low (<250 x 10^6 cubic meters) and High (>250 x 10^6 cubic meters) storage regimes. Ridge regression achieved the lowest cross validated RMSE (12.3 x 10^6 cubic meters), outperforming GB by 16% and RF by 7%. In regime terms, Ridge was superior in the Low band (18.0 ver. 22.7 MCM for GB) and tied RF in the High band (~12 MCM). In sample diagnostics showed GB's apparent dominance (6.8-5.4 MCM) to be an artefact of overfitting. A Ridge meta stacked ensemble combining GB, RF, and Ridge reduced full series RMSE to ~ 11 MCM (~ 3% of live capacity). We recommend (i) GB retrained daily for routine operations, (ii) Ridge for drought early warning, and (iii) the stacked blend for all weather dashboards. Quarterly rolling retraining and regime specific metrics are advised to maintain operational accuracy below the 5% threshold mandated by the Department of Water and Sanitation.