Prediction in ungauged regions with sparse flow duration curves and input-selection ensemble modeling
This work provides a substantial improvement in streamflow prediction accuracy for hydrologists and water resource managers in regions lacking direct measurement gauges, which is a critical problem for water management.
This paper addresses the challenge of streamflow prediction in ungauged regions by integrating sparse flow duration curve (FDC) data into an LSTM-based network. The proposed method achieved a median Kling-Gupta efficiency (KGE) of 0.62 for a US dataset, significantly outperforming previous state-of-the-art global-scale ungauged basin tests.
While long short-term memory (LSTM) models have demonstrated stellar performance with streamflow predictions, there are major risks in applying these models in contiguous regions with no gauges, or predictions in ungauged regions (PUR) problems. However, softer data such as the flow duration curve (FDC) may be already available from nearby stations, or may become available. Here we demonstrate that sparse FDC data can be migrated and assimilated by an LSTM-based network, via an encoder. A stringent region-based holdout test showed a median Kling-Gupta efficiency (KGE) of 0.62 for a US dataset, substantially higher than previous state-of-the-art global-scale ungauged basin tests. The baseline model without FDC was already competitive (median KGE 0.56), but integrating FDCs had substantial value. Because of the inaccurate representation of inputs, the baseline models might sometimes produce catastrophic results. However, model generalizability was further meaningfully improved by compiling an ensemble based on models with different input selections.