PELGAug 10, 2025

BIGBOY1.2: Generating Realistic Synthetic Data for Disease Outbreak Modelling and Analytics

arXiv:2508.07239v1
Originality Synthesis-oriented
AI Analysis

This provides a standardized tool for researchers in epidemiology and data science to compare traditional and modern methods, though it is incremental as it builds on existing synthetic data generation concepts.

The authors tackled the challenge of incomplete and noisy disease outbreak data by creating BIGBOY1.2, an open synthetic dataset generator that produces configurable epidemic time series and population-level trajectories, enabling benchmarking of modelling, forecasting, and visualization methods.

Modelling disease outbreak models remains challenging due to incomplete surveillance data, noise, and limited access to standardized datasets. We have created BIGBOY1.2, an open synthetic dataset generator that creates configurable epidemic time series and population-level trajectories suitable for benchmarking modelling, forecasting, and visualisation. The framework supports SEIR and SIR-like compartmental logic, custom seasonality, and noise injection to mimic real reporting artifacts. BIGBOY1.2 can produce datasets with diverse characteristics, making it suitable for comparing traditional epidemiological models (e.g., SIR, SEIR) with modern machine learning approaches (e.g., SVM, neural networks).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes