Lightweight Temporal Transformer Decomposition for Federated Autonomous Driving
This work addresses the challenge of making temporal data processing practical for federated learning in autonomous driving, though it appears incremental as it builds on existing transformer methods with optimizations.
The paper tackles the problem of resource-intensive temporal fusion networks in autonomous driving by proposing lightweight temporal transformer decomposition, which reduces model complexity and enables efficient training and real-time predictions, outperforming recent approaches on three datasets.
Traditional vision-based autonomous driving systems often face difficulties in navigating complex environments when relying solely on single-image inputs. To overcome this limitation, incorporating temporal data such as past image frames or steering sequences, has proven effective in enhancing robustness and adaptability in challenging scenarios. While previous high-performance methods exist, they often rely on resource-intensive fusion networks, making them impractical for training and unsuitable for federated learning. To address these challenges, we propose lightweight temporal transformer decomposition, a method that processes sequential image frames and temporal steering data by breaking down large attention maps into smaller matrices. This approach reduces model complexity, enabling efficient weight updates for convergence and real-time predictions while leveraging temporal information to enhance autonomous driving performance. Intensive experiments on three datasets demonstrate that our method outperforms recent approaches by a clear margin while achieving real-time performance. Additionally, real robot experiments further confirm the effectiveness of our method.