SD CV MM ASDec 22, 2024

AV-DTEC: Self-Supervised Audio-Visual Fusion for Drone Trajectory Estimation and Classification

Zhenyuan Xiao, Yizhuo Yang, Guili Xu, Xianglong Zeng, Shenghai Yuan

arXiv:2412.16928v18.310 citationsh-index: 9Has Code

Originality Incremental advance

AI Analysis

This addresses public safety threats from compact UAVs by providing a more efficient anti-UAV system, though it appears incremental as it builds on existing fusion and self-supervised methods.

The paper tackles the problem of detecting and classifying drone trajectories using a lightweight self-supervised audio-visual fusion system, achieving exceptional accuracy in real-world multi-modality data.

The increasing use of compact UAVs has created significant threats to public safety, while traditional drone detection systems are often bulky and costly. To address these challenges, we propose AV-DTEC, a lightweight self-supervised audio-visual fusion-based anti-UAV system. AV-DTEC is trained using self-supervised learning with labels generated by LiDAR, and it simultaneously learns audio and visual features through a parallel selective state-space model. With the learned features, a specially designed plug-and-play primary-auxiliary feature enhancement module integrates visual features into audio features for better robustness in cross-lighting conditions. To reduce reliance on auxiliary features and align modalities, we propose a teacher-student model that adaptively adjusts the weighting of visual features. AV-DTEC demonstrates exceptional accuracy and effectiveness in real-world multi-modality data. The code and trained models are publicly accessible on GitHub \url{https://github.com/AmazingDay1/AV-DETC}.

View on arXiv PDF Code

Similar