CVJan 17, 2022

SwinUNet3D -- A Hierarchical Architecture for Deep Traffic Prediction using Shifted Window Transformers

arXiv:2201.06390v16 citationsHas Code
AI Analysis

This work addresses traffic prediction for mobility management and logistics, but it appears incremental as it adapts existing transformer and UNet methods to a specific domain.

The paper tackles traffic forecasting by proposing SwinUNet3D, a hierarchical architecture that replaces convolution-based blocks in UNet with 3D shifted window transformers in both encoder and decoder branches, and it was tested on the Traffic4cast2021 dataset to predict an hour of traffic conditions from one hour of given data.

Traffic forecasting is an important element of mobility management, an important key that drives the logistics industry. Over the years, lots of work have been done in Traffic forecasting using time series as well as spatiotemporal dynamic forecasting. In this paper, we explore the use of vision transformer in a UNet setting. We completely remove all convolution-based building blocks in UNet, while using 3D shifted window transformer in both encoder and decoder branches. In addition, we experiment with the use of feature mixing just before patch encoding to control the inter-relationship of the feature while avoiding contraction of the depth dimension of our spatiotemporal input. The proposed network is tested on the data provided by Traffic Map Movie Forecasting Challenge 2021(Traffic4cast2021), held in the competition track of Neural Information Processing Systems (NeurIPS). Traffic4cast2021 task is to predict an hour (6 frames) of traffic conditions (volume and average speed)from one hour of given traffic state (12 frames averaged in 5 minutes time span). Source code is available online at https://github.com/bojesomo/Traffic4Cast2021-SwinUNet3D.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes