AO-PHSep 22, 2025
FastNet: Improving the physical consistency of machine-learning weather prediction models through loss function designTom Dunstan, Oliver Strickson, Thusal Bennett et al.
Machine learning weather prediction (MLWP) models have demonstrated remarkable potential in delivering accurate forecasts at significantly reduced computational cost compared to traditional numerical weather prediction (NWP) systems. However, challenges remain in ensuring the physical consistency of MLWP outputs, particularly in deterministic settings. This study presents FastNet, a graph neural network (GNN)-based global prediction model, and investigates the impact of alternative loss function designs on improving the physical realism of its forecasts. We explore three key modifications to the standard mean squared error (MSE) loss: (1) a modified spherical harmonic (MSH) loss that penalises spectral amplitude errors to reduce blurring and enhance small-scale structure retention; (2) inclusion of horizontal gradient terms in the loss to suppress non-physical artefacts; and (3) an alternative wind representation that decouples speed and direction to better capture extreme wind events. Results show that while the MSH and gradient-based losses \textit{alone} may slightly degrade RMSE scores, when trained in combination the model exhibits very similar MSE performance to an MSE-trained model while at the same time significantly improving spectral fidelity and physical consistency. The alternative wind representation further improves wind speed accuracy and reduces directional bias. Collectively, these findings highlight the importance of loss function design as a mechanism for embedding domain knowledge into MLWP models and advancing their operational readiness.
LGJun 24, 2021
Improved Regret Bounds for Tracking Experts with MemoryJames Robinson, Mark Herbster
We address the problem of sequential prediction with expert advice in a non-stationary environment with long-term memory guarantees in the sense of Bousquet and Warmuth [4]. We give a linear-time algorithm that improves on the best known regret bounds [26]. This algorithm incorporates a relative entropy projection step. This projection is advantageous over previous weight-sharing approaches in that weight updates may come with implicit costs as in for example portfolio optimization. We give an algorithm to compute this projection step in linear time, which may be of independent interest.
CRAug 23, 2019
Design choices for productive, secure, data-intensive research at scale in the cloudDiego Arenas, Jon Atkins, Claire Austin et al.
We present a policy and process framework for secure environments for productive data science research projects at scale, by combining prevailing data security threat and risk profiles into five sensitivity tiers, and, at each tier, specifying recommended policies for data classification, data ingress, software ingress, data egress, user access, user device control, and analysis environments. By presenting design patterns for security choices for each tier, and using software defined infrastructure so that a different, independent, secure research environment can be instantiated for each project appropriate to its classification, we hope to maximise researcher productivity and minimise risk, allowing research organisations to operate with confidence.
LGJun 17, 2018
Online Prediction of Switching Graph Labelings with Cluster SpecialistsMark Herbster, James Robinson
We address the problem of predicting the labeling of a graph in an online setting when the labeling is changing over time. We present an algorithm based on a specialist approach; we develop the machinery of cluster specialists which probabilistically exploits the cluster structure in the graph. Our algorithm has two variants, one of which surprisingly only requires $\mathcal{O}(\log n)$ time on any trial $t$ on an $n$-vertex graph, an exponential speed up over existing methods. We prove switching mistake-bound guarantees for both variants of our algorithm. Furthermore these mistake bounds smoothly vary with the magnitude of the change between successive labelings. We perform experiments on Chicago Divvy Bicycle Sharing data and show that our algorithms significantly outperform an existing algorithm (a kernelized Perceptron) as well as several natural benchmarks.