LGAIJun 24, 2023

Unleashing Realistic Air Quality Forecasting: Introducing the Ready-to-Use PurpleAirSF Dataset

arXiv:2306.13948v24 citationsh-index: 19Has Code
Originality Synthesis-oriented
AI Analysis

This provides a ready-to-use dataset for researchers developing air quality forecasting models, though it is incremental as it focuses on data availability rather than novel methods.

The paper tackles the problem of limited open-source datasets for air quality forecasting by introducing PurpleAirSF, a comprehensive dataset collected from the PurpleAir network, which includes high temporal resolution and diverse geographical coverage, and establishes benchmarks using spatio-temporal models.

Air quality forecasting has garnered significant attention recently, with data-driven models taking center stage due to advancements in machine learning and deep learning models. However, researchers face challenges with complex data acquisition and the lack of open-sourced datasets, hindering efficient model validation. This paper introduces PurpleAirSF, a comprehensive and easily accessible dataset collected from the PurpleAir network. With its high temporal resolution, various air quality measures, and diverse geographical coverage, this dataset serves as a useful tool for researchers aiming to develop novel forecasting models, study air pollution patterns, and investigate their impacts on health and the environment. We present a detailed account of the data collection and processing methods employed to build PurpleAirSF. Furthermore, we conduct preliminary experiments using both classic and modern spatio-temporal forecasting models, thereby establishing a benchmark for future air quality forecasting tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes