LG AIApr 30, 2025

A comparative study of deep learning and ensemble learning to extend the horizon of traffic forecasting

Xiao Zheng, Saeed Asadi Bagloee, Majid Sarvi

arXiv:2504.21358v14.13 citationsh-index: 21

Originality Synthesis-oriented

AI Analysis

This is an incremental study that provides practical guidance for researchers and practitioners in intelligent transportation systems on method selection for long-term traffic forecasting.

This paper tackles the challenge of long-term traffic forecasting (up to 30 days ahead) by comparing deep learning and ensemble methods on real-world traffic datasets, finding that time embedding helps naive RNN outperform the state-of-the-art Informer by 31.1% for 30-day-ahead predictions and that XGBoost performs competitively using only time features.

Traffic forecasting is vital for Intelligent Transportation Systems, for which Machine Learning (ML) methods have been extensively explored to develop data-driven Artificial Intelligence (AI) solutions. Recent research focuses on modelling spatial-temporal correlations for short-term traffic prediction, leaving the favourable long-term forecasting a challenging and open issue. This paper presents a comparative study on large-scale real-world signalized arterials and freeway traffic flow datasets, aiming to evaluate promising ML methods in the context of large forecasting horizons up to 30 days. Focusing on modelling capacity for temporal dynamics, we develop one ensemble ML method, eXtreme Gradient Boosting (XGBoost), and a range of Deep Learning (DL) methods, including Recurrent Neural Network (RNN)-based methods and the state-of-the-art Transformer-based method. Time embedding is leveraged to enhance their understanding of seasonality and event factors. Experimental results highlight that while the attention mechanism/Transformer framework is effective for capturing long-range dependencies in sequential data, as the forecasting horizon extends, the key to effective traffic forecasting gradually shifts from temporal dependency capturing to periodicity modelling. Time embedding is particularly effective in this context, helping naive RNN outperform Informer by 31.1% for 30-day-ahead forecasting. Meanwhile, as an efficient and robust model, XGBoost, while learning solely from time features, performs competitively with DL methods. Moreover, we investigate the impacts of various factors like input sequence length, holiday traffic, data granularity, and training data size. The findings offer valuable insights and serve as a reference for future long-term traffic forecasting research and the improvement of AI's corresponding learning capabilities.

View on arXiv PDF

Similar