LGMar 18, 2023
Machine learning with data assimilation and uncertainty quantification for dynamical systems: a reviewSibo Cheng, Cesar Quilodran-Casas, Said Ouala et al.
Data Assimilation (DA) and Uncertainty quantification (UQ) are extensively used in analysing and reducing error propagation in high-dimensional spatial-temporal dynamics. Typical applications span from computational fluid dynamics (CFD) to geoscience and climate systems. Recently, much effort has been given in combining DA, UQ and machine learning (ML) techniques. These research efforts seek to address some critical challenges in high-dimensional dynamical systems, including but not limited to dynamical system identification, reduced order surrogate modelling, error covariance specification and model error correction. A large number of developed techniques and methodologies exhibit a broad applicability across numerous domains, resulting in the necessity for a comprehensive guide. This paper provides the first overview of the state-of-the-art researches in this interdisciplinary field, covering a wide range of applications. This review aims at ML scientists who attempt to apply DA and UQ techniques to improve the accuracy and the interpretability of their models, but also at DA and UQ experts who intend to integrate cutting-edge ML approaches to their systems. Therefore, this article has a special focus on how ML methods can overcome the existing limits of DA and UQ, and vice versa. Some exciting perspectives of this rapidly developing research field are also discussed.
LGFeb 18, 2025
Ensemble Kalman filter in latent space using a variational autoencoder pairIvo Pasmans, Yumeng Chen, Tobias Sebastian Finn et al.
Popular (ensemble) Kalman filter data assimilation (DA) approaches assume that the errors in both the a priori estimate of the state and those in the observations are Gaussian. For constrained variables, e.g. sea ice concentration or stress, such an assumption does not hold. The variational autoencoder (VAE) is a machine learning (ML) technique that allows to map an arbitrary distribution to/from a latent space in which the distribution is supposedly closer to a Gaussian. We propose a novel hybrid DA-ML approach in which VAEs are incorporated in the DA procedure. Specifically, we introduce a variant of the popular ensemble transform Kalman filter (ETKF) in which the analysis is applied in the latent space of a single VAE or a pair of VAEs. In twin experiments with a simple circular model, whereby the circle represents an underlying submanifold to be respected, we find that the use of a VAE ensures that a posteri ensemble members lie close to the manifold containing the truth. Furthermore, online updating of the VAE is necessary and achievable when this manifold varies in time, i.e. when it is non-stationary. We demonstrate that introducing an additional second latent space for the observational innovations improves robustness against detrimental effects of non-Gaussianity and bias in the observational errors but it slightly lessens the performance if observational errors are strictly Gaussian.
AO-PHAug 20, 2025
Generative AI models capture realistic sea-ice evolution from days to decadesTobias Sebastian Finn, Marc Bocquet, Pierre Rampal et al.
Sea ice plays an important role in stabilising the Earth system. Yet, representing its dynamics remains a major challenge for models, as the underlying processes are scale-invariant and highly anisotropic. This poses a dilemma: physics-based models that faithfully reproduce the observed dynamics are computationally costly, while efficient AI models sacrifice realism. Here, to resolve this dilemma, we introduce GenSIM, the first generative AI model to predict the evolution of the full Arctic sea-ice state at 12-hour increments. Trained for sub-daily forecasting on 20 years of sea-ice-ocean simulation data, GenSIM makes realistic predictions for 30 years, while reproducing the dynamical properties of sea ice with its leads and ridges and capturing long-term trends in the sea-ice volume. Notably, although solely driven by atmospheric reanalysis, GenSIM implicitly learns hidden signatures of multi-year ice-ocean interaction. Therefore, generative AI can extrapolate from sub-daily forecasts to decadal simulations, while retaining physical consistency.
AO-PHApr 7, 2025
Hybrid machine learning data assimilation for marine biogeochemistryIeuan Higgs, Ross Bannister, Jozef Skákala et al.
Marine biogeochemistry models are critical for forecasting, as well as estimating ecosystem responses to climate change and human activities. Data assimilation (DA) improves these models by aligning them with real-world observations, but marine biogeochemistry DA faces challenges due to model complexity, strong nonlinearity, and sparse, uncertain observations. Existing DA methods applied to marine biogeochemistry struggle to update unobserved variables effectively, while ensemble-based methods are computationally too expensive for high-complexity marine biogeochemistry models. This study demonstrates how machine learning (ML) can improve marine biogeochemistry DA by learning statistical relationships between observed and unobserved variables. We integrate ML-driven balancing schemes into a 1D prototype of a system used to forecast marine biogeochemistry in the North-West European Shelf seas. ML is applied to predict (i) state-dependent correlations from free-run ensembles and (ii), in an ``end-to-end'' fashion, analysis increments from an Ensemble Kalman Filter. Our results show that ML significantly enhances updates for previously not-updated variables when compared to univariate schemes akin to those used operationally. Furthermore, ML models exhibit moderate transferability to new locations, a crucial step toward scaling these methods to 3D operational systems. We conclude that ML offers a clear pathway to overcome current computational bottlenecks in marine biogeochemistry DA and that refining transferability, optimizing training data sampling, and evaluating scalability for large-scale marine forecasting, should be future research priorities.
COMP-PHSep 9, 2020
Combining data assimilation and machine learning to infer unresolved scale parametrisationJulien Brajard, Alberto Carrassi, Marc Bocquet et al.
In recent years, machine learning (ML) has been proposed to devise data-driven parametrisations of unresolved processes in dynamical numerical models. In most cases, the ML training leverages high-resolution simulations to provide a dense, noiseless target state. Our goal is to go beyond the use of high-resolution simulations and train ML-based parametrisation using direct data, in the realistic scenario of noisy and sparse observations. The algorithm proposed in this work is a two-step process. First, data assimilation (DA) techniques are applied to estimate the full state of the system from a truncated model. The unresolved part of the truncated model is viewed as a model error in the DA system. In a second step, ML is used to emulate the unresolved part, a predictor of model error given the state of the system. Finally, the ML-based parametrisation model is added to the physical core truncated model to produce a hybrid model. The DA component of the proposed method relies on an ensemble Kalman filter while the ML parametrisation is represented by a neural network. The approach is applied to the two-scale Lorenz model and to MAOOAM, a reduced-order coupled ocean-atmosphere model. We show that in both cases the hybrid model yields forecasts with better skill than the truncated model. Moreover, the attractor of the system is significantly better represented by the hybrid model than by the truncated model.
MLJan 17, 2020
Bayesian inference of chaotic dynamics by merging data assimilation, machine learning and expectation-maximizationMarc Bocquet, Julien Brajard, Alberto Carrassi et al.
The reconstruction from observations of high-dimensional chaotic dynamics such as geophysical flows is hampered by (i) the partial and noisy observations that can realistically be obtained, (ii) the need to learn from long time series of data, and (iii) the unstable nature of the dynamics. To achieve such inference from the observations over long time series, it has been suggested to combine data assimilation and machine learning in several ways. We show how to unify these approaches from a Bayesian perspective using expectation-maximization and coordinate descents. In doing so, the model, the state trajectory and model error statistics are estimated all together. Implementations and approximations of these methods are discussed. Finally, we numerically and successfully test the approach on two relevant low-order chaotic models with distinct identifiability.