J. David Neelin

LG
3papers
174citations
Novelty50%
AI Score32

3 Papers

LGJun 14, 2023Code
ClimSim-Online: A Large Multi-scale Dataset and Framework for Hybrid ML-physics Climate Emulation

Sungduk Yu, Zeyuan Hu, Akshay Subramaniam et al.

Modern climate projections lack adequate spatial and temporal resolution due to computational constraints, leading to inaccuracies in representing critical processes like thunderstorms that occur on the sub-resolution scale. Hybrid methods combining physics with machine learning (ML) offer faster, higher fidelity climate simulations by outsourcing compute-hungry, high-resolution simulations to ML emulators. However, these hybrid ML-physics simulations require domain-specific data and workflows that have been inaccessible to many ML experts. As an extension of the ClimSim dataset (Yu et al., 2024), we present ClimSim-Online, which also includes an end-to-end workflow for developing hybrid ML-physics simulators. The ClimSim dataset includes 5.7 billion pairs of multivariate input/output vectors, capturing the influence of high-resolution, high-fidelity physics on a host climate simulator's macro-scale state. The dataset is global and spans ten years at a high sampling frequency. We provide a cross-platform, containerized pipeline to integrate ML models into operational climate simulators for hybrid testing. We also implement various ML baselines, alongside a hybrid baseline simulator, to highlight the ML challenges of building stable, skillful emulators. The data (https://huggingface.co/datasets/LEAP/ClimSim_high-res) and code (https://leap-stc.github.io/ClimSim and https://github.com/leap-stc/climsim-online) are publicly released to support the development of hybrid ML-physics and high-fidelity climate simulations.

LGAug 10, 2024
FuXi Weather: A data-to-forecast machine learning system for global weather

Xiuyu Sun, Xiaohui Zhong, Xiaoze Xu et al.

Weather forecasting traditionally relies on numerical weather prediction (NWP) systems that integrates global observational systems, data assimilation (DA), and forecasting models. Despite steady improvements in forecast accuracy over recent decades, further advances are increasingly constrained by high computational costs, the underutilization of vast observational datasets, and the challenges of obtaining finer resolution. These limitations, alongside the uneven distribution of observational networks, result in global disparities in forecast accuracy, leaving some regions vulnerable to extreme weather. Recent advances in machine learning present a promising alternative, providing more efficient and accurate forecasts using the same initial conditions as NWP. However, current machine learning models still depend on the initial conditions generated by NWP systems, which require extensive computational resources and expertise. Here we introduce FuXi Weather, a machine learning weather forecasting system that assimilates data from multiple satellites. Operating on a 6-hourly DA and forecast cycle, FuXi Weather generates reliable and accurate 10-day global weather forecasts at a spatial resolution of $0.25^\circ$. FuXi Weather is the first system to achieve all-grid, all-surface, all-channel, and all-sky DA and forecasting, extending skillful forecast lead times beyond those of the European Centre for Medium-range Weather Forecasts (ECMWF) high-resolution forecasts (HRES) while using significantly fewer observations. FuXi Weather consistently outperforms ECMWF HRES in observation-sparse regions, such as central Africa, demonstrating its potential to improve forecasts where observational infrastructure is limited.

LGDec 14, 2021
Climate-Invariant Machine Learning

Tom Beucler, Pierre Gentine, Janni Yuval et al.

Projecting climate change is a generalization problem: we extrapolate the recent past using physical models across past, present, and future climates. Current climate models require representations of processes that occur at scales smaller than model grid size, which have been the main source of model projection uncertainty. Recent machine learning (ML) algorithms hold promise to improve such process representations, but tend to extrapolate poorly to climate regimes they were not trained on. To get the best of the physical and statistical worlds, we propose a new framework - termed "climate-invariant" ML - incorporating knowledge of climate processes into ML algorithms, and show that it can maintain high offline accuracy across a wide range of climate conditions and configurations in three distinct atmospheric models. Our results suggest that explicitly incorporating physical knowledge into data-driven models of Earth system processes can improve their consistency, data efficiency, and generalizability across climate regimes.