MLLGDec 8, 2025

Machine learning in an expectation-maximisation framework for nowcasting

arXiv:2512.07335v1h-index: 39
Originality Incremental advance
AI Analysis

This work addresses nowcasting challenges for decision-makers in fields like public health, though it is incremental as it builds on existing EM frameworks by integrating modern machine learning methods.

The paper tackles the problem of nowcasting incomplete information due to reporting delays by proposing an expectation-maximisation framework that uses machine learning techniques like neural networks and XGBoost to model event occurrence and reporting processes. It shows that this approach outperforms existing EM-based methods using generalised linear models in simulation experiments and in an application to Argentinian Covid-19 case reporting, with XGBoost performing best.

Decision making often occurs in the presence of incomplete information, leading to the under- or overestimation of risk. Leveraging the observable information to learn the complete information is called nowcasting. In practice, incomplete information is often a consequence of reporting or observation delays. In this paper, we propose an expectation-maximisation (EM) framework for nowcasting that uses machine learning techniques to model both the occurrence as well as the reporting process of events. We allow for the inclusion of covariate information specific to the occurrence and reporting periods as well as characteristics related to the entity for which events occurred. We demonstrate how the maximisation step and the information flow between EM iterations can be tailored to leverage the predictive power of neural networks and (extreme) gradient boosting machines (XGBoost). With simulation experiments, we show that we can effectively model both the occurrence and reporting of events when dealing with high-dimensional covariate information. In the presence of non-linear effects, we show that our methodology outperforms existing EM-based nowcasting frameworks that use generalised linear models in the maximisation step. Finally, we apply the framework to the reporting of Argentinian Covid-19 cases, where the XGBoost-based approach again is most performant.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes