LGJul 12, 2025

Temporal Misalignment Attacks against Multimodal Perception in Autonomous Driving

Md Hasan Shahriar, Md Mohaimin Al Barat, Harshavardhan Sundar, Ning Zhang, Naren Ramakrishnan, Y. Thomas Hou, Wenjing Lou

arXiv:2507.09095v23 citationsh-index: 14

Originality Incremental advance

AI Analysis

This exposes a critical security flaw in autonomous driving systems, potentially leading to collisions, but it is incremental as it builds on known synchronization issues.

The paper tackles the vulnerability of multimodal fusion in autonomous driving to temporal misalignment attacks, showing that induced delays in sensor streams can reduce car detection mAP by up to 88.5% and tracking accuracy by 73%.

Multimodal fusion (MMF) plays a critical role in the perception of autonomous driving, which primarily fuses camera and LiDAR streams for a comprehensive and efficient scene understanding. However, its strict reliance on precise temporal synchronization exposes it to new vulnerabilities. In this paper, we introduce DejaVu, an attack that exploits the in-vehicular network and induces delays across sensor streams to create subtle temporal misalignments, severely degrading downstream MMF-based perception tasks. Our comprehensive attack analysis across different models and datasets reveals the sensors' task-specific imbalanced sensitivities: object detection is overly dependent on LiDAR inputs, while object tracking is highly reliant on the camera inputs. Consequently, with a single-frame LiDAR delay, an attacker can reduce the car detection mAP by up to 88.5%, while with a three-frame camera delay, multiple object tracking accuracy (MOTA) for car drops by 73%. We further demonstrated two attack scenarios using an automotive Ethernet testbed for hardware-in-the-loop validation and the Autoware stack for end-to-end AD simulation, demonstrating the feasibility of the DejaVu attack and its severe impact, such as collisions and phantom braking.

View on arXiv PDF

Similar