Yongquan Qu

h-index13

6papers

106citations

Novelty58%

AI Score45

Ranked #69,370 of 201,326 authors (top 34%)#15,691 in LG (top 37%)

6 Papers

CVApr 10, 2024Code

Deep Generative Data Assimilation in Multimodal Setting

Yongquan Qu, Juan Nathaniel, Shuolin Li et al.

Robust integration of physical knowledge and data is key to improve computational simulations, such as Earth system models. Data assimilation is crucial for achieving this goal because it provides a systematic framework to calibrate model outputs with observations, which can include remote sensing imagery and ground station measurements, with uncertainty quantification. Conventional methods, including Kalman filters and variational approaches, inherently rely on simplifying linear and Gaussian assumptions, and can be computationally expensive. Nevertheless, with the rapid adoption of data-driven methods in many areas of computational sciences, we see the potential of emulating traditional data assimilation with deep learning, especially generative models. In particular, the diffusion-based probabilistic framework has large overlaps with data assimilation principles: both allows for conditional generation of samples with a Bayesian inverse framework. These models have shown remarkable success in text-conditioned image generation or image-controlled video synthesis. Likewise, one can frame data assimilation as observation-conditioned state calibration. In this work, we propose SLAMS: Score-based Latent Assimilation in Multimodal Setting. Specifically, we assimilate in-situ weather station data and ex-situ satellite imagery to calibrate the vertical temperature profiles, globally. Through extensive ablation, we demonstrate that SLAMS is robust even in low-resolution, noisy, and sparse data settings. To our knowledge, our work is the first to apply deep generative framework for multimodal data assimilation using real-world datasets; an important step for building robust computational simulators, including the next-generation Earth system models. Our code is available at: https://github.com/yongquan-qu/SLAMS

CVFeb 1, 2024

ChaosBench: A Multi-Channel, Physics-Based Benchmark for Subseasonal-to-Seasonal Climate Prediction

Juan Nathaniel, Yongquan Qu, Tung Nguyen et al.

Accurate prediction of climate in the subseasonal-to-seasonal scale is crucial for disaster preparedness and robust decision making amidst climate change. Yet, forecasting beyond the weather timescale is challenging because it deals with problems other than initial condition, including boundary interaction, butterfly effect, and our inherent lack of physical understanding. At present, existing benchmarks tend to have shorter forecasting range of up-to 15 days, do not include a wide range of operational baselines, and lack physics-based constraints for explainability. Thus, we propose ChaosBench, a challenging benchmark to extend the predictability range of data-driven weather emulators to S2S timescale. First, ChaosBench is comprised of variables beyond the typical surface-atmospheric ERA5 to also include ocean, ice, and land reanalysis products that span over 45 years to allow for full Earth system emulation that respects boundary conditions. We also propose physics-based, in addition to deterministic and probabilistic metrics, to ensure a physically-consistent ensemble that accounts for butterfly effect. Furthermore, we evaluate on a diverse set of physics-based forecasts from four national weather agencies as baselines to our data-driven counterpart such as ViT/ClimaX, PanguWeather, GraphCast, and FourCastNetV2. Overall, we find methods originally developed for weather-scale applications fail on S2S task: their performance simply collapse to an unskilled climatology. Nonetheless, we outline and demonstrate several strategies that can extend the predictability range of existing weather emulators, including the use of ensembles, robust control of error propagation, and the use of physics-informed models. Our benchmark, datasets, and instructions are available at https://leap-stc.github.io/ChaosBench.

LGMar 4, 2024

Joint Parameter and Parameterization Inference with Uncertainty Quantification through Differentiable Programming

Yongquan Qu, Mohamed Aziz Bhouri, Pierre Gentine

Accurate representations of unknown and sub-grid physical processes through parameterizations (or closure) in numerical simulations with quantified uncertainty are critical for resolving the coarse-grained partial differential equations that govern many problems ranging from weather and climate prediction to turbulence simulations. Recent advances have seen machine learning (ML) increasingly applied to model these subgrid processes, resulting in the development of hybrid physics-ML models through the integration with numerical solvers. In this work, we introduce a novel framework for the joint estimation of physical parameters and machine learning parameterizations with uncertainty quantification. Our framework incorporates online training and efficient Bayesian inference within a high-dimensional parameter space, facilitated by differentiable programming. This proof of concept underscores the substantial potential of differentiable programming in synergistically combining machine learning with differential equations, thereby enhancing the capabilities of hybrid physics-ML modeling.

LGMay 23, 2025

Strictly Constrained Generative Modeling via Split Augmented Langevin Sampling

Matthieu Blanke, Yongquan Qu, Sara Shamekh et al.

Deep generative models hold great promise for representing complex physical systems, but their deployment is currently limited by the lack of guarantees on the physical plausibility of the generated outputs. Ensuring that known physical constraints are enforced is therefore critical when applying generative models to scientific and engineering problems. We address this limitation by developing a principled framework for sampling from a target distribution while rigorously satisfying physical constraints. Leveraging the variational formulation of Langevin dynamics, we propose Split Augmented Langevin (SAL), a novel primal-dual sampling algorithm that enforces constraints progressively through variable splitting, with convergence guarantees. While the method is developed theoretically for Langevin dynamics, we demonstrate its effective applicability to diffusion models. In particular, we use constrained diffusion models to generate physical fields satisfying energy and mass conservation laws. We apply our method to diffusion-based data assimilation on a complex physical system, where enforcing physical constraints substantially improves both forecast accuracy and the preservation of critical conserved quantities. We also demonstrate the potential of SAL for challenging feasibility problems in optimal control.

LGOct 5, 2025

Incorporating Multivariate Consistency in ML-Based Weather Forecasting with Latent-space Constraints

Hang Fan, Yi Xiao, Yongquan Qu et al.

Data-driven machine learning (ML) models have recently shown promise in surpassing traditional physics-based approaches for weather forecasting, leading to a so-called second revolution in weather forecasting. However, most ML-based forecast models treat reanalysis as the truth and are trained under variable-specific loss weighting, ignoring their physical coupling and spatial structure. Over long time horizons, the forecasts become blurry and physically unrealistic under rollout training. To address this, we reinterpret model training as a weak-constraint four-dimensional variational data assimilation (WC-4DVar) problem, treating reanalysis data as imperfect observations. This allows the loss function to incorporate reanalysis error covariance and capture multivariate dependencies. In practice, we compute the loss in a latent space learned by an autoencoder (AE), where the reanalysis error covariance becomes approximately diagonal, thus avoiding the need to explicitly model it in the high-dimensional model space. We show that rollout training with latent-space constraints improves long-term forecast skill and better preserves fine-scale structures and physical realism compared to training with model-space loss. Finally, we extend this framework to accommodate heterogeneous data sources, enabling the forecast model to be trained jointly on reanalysis and multi-source observations within a unified theoretical formulation.

LGAug 1, 2025

PnP-DA: Towards Principled Plug-and-Play Integration of Variational Data Assimilation and Generative Models

Yongquan Qu, Matthieu Blanke, Sara Shamekh et al.

Earth system modeling presents a fundamental challenge in scientific computing: capturing complex, multiscale nonlinear dynamics in computationally efficient models while minimizing forecast errors caused by necessary simplifications. Even the most powerful AI- or physics-based forecast system suffer from gradual error accumulation. Data assimilation (DA) aims to mitigate these errors by optimally blending (noisy) observations with prior model forecasts, but conventional variational methods often assume Gaussian error statistics that fail to capture the true, non-Gaussian behavior of chaotic dynamical systems. We propose PnP-DA, a Plug-and-Play algorithm that alternates (1) a lightweight, gradient-based analysis update (using a Mahalanobis-distance misfit on new observations) with (2) a single forward pass through a pretrained generative prior conditioned on the background forecast via a conditional Wasserstein coupling. This strategy relaxes restrictive statistical assumptions and leverages rich historical data without requiring an explicit regularization functional, and it also avoids the need to backpropagate gradients through the complex neural network that encodes the prior during assimilation cycles. Experiments on standard chaotic testbeds demonstrate that this strategy consistently reduces forecast errors across a range of observation sparsities and noise levels, outperforming classical variational methods.