CVLGIVOct 24, 2019

Spatiotemporal Tile-based Attention-guided LSTMs for Traffic Video Prediction

arXiv:1910.11030v46 citations
Originality Incremental advance
AI Analysis

This work addresses traffic prediction for urban planning and management, but it is incremental as it builds on existing Conv-LSTM ideas with specific enhancements for scalability.

The paper tackles traffic video prediction by modeling fine-grained and coarse spatial structures with temporal relationships, introducing a tile-aware, cascaded-memory Conv-LSTM with cross-frame attention and a memory-flexible training scheme, resulting in improved scalability and competitive forecasting performance on large-scale traffic heatmaps.

This extended abstract describes our solution for the Traffic4Cast Challenge 2019. The task requires modeling both fine-grained (pixel-level) and coarse (region-level) spatial structure while preserving temporal relationships across long sequences. Building on Conv-LSTM ideas, we introduce a tile-aware, cascaded-memory Conv-LSTM augmented with cross-frame additive attention and a memory-flexible training scheme: frames are sampled per spatial tile so the model learns tile-local dynamics and per-tile memory cells can be updated sparsely, paged, or compressed to scale to large maps. We provide a compact theoretical analysis (tight softmax/attention Lipschitz bound and a tiling error lower bound) explaining stability and the memory-accuracy tradeoffs, and empirically demonstrate improved scalability and competitive forecasting performance on large-scale traffic heatmaps.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes