Yuting Wan

CV
h-index3
3papers
1citation
Novelty52%
AI Score41

3 Papers

CVApr 30, 2025Code
Is Intermediate Fusion All You Need for UAV-based Collaborative Perception?

Jiuwu Hao, Liguo Sun, Yuting Wan et al.

Collaborative perception enhances environmental awareness through inter-agent communication and is regarded as a promising solution to intelligent transportation systems. However, existing collaborative methods for Unmanned Aerial Vehicles (UAVs) overlook the unique characteristics of the UAV perspective, resulting in substantial communication overhead. To address this issue, we propose a novel communication-efficient collaborative perception framework based on late-intermediate fusion, dubbed LIF. The core concept is to exchange informative and compact detection results and shift the fusion stage to the feature representation level. In particular, we leverage vision-guided positional embedding (VPE) and box-based virtual augmented feature (BoBEV) to effectively integrate complementary information from various agents. Additionally, we innovatively introduce an uncertainty-driven communication mechanism that uses uncertainty evaluation to select high-quality and reliable shared areas. Experimental results demonstrate that our LIF achieves superior performance with minimal communication bandwidth, proving its effectiveness and practicality. Code and models are available at https://github.com/uestchjw/LIF.

CVFeb 16, 2025Code
FeaKM: Robust Collaborative Perception under Noisy Pose Conditions

Jiuwu Hao, Liguo Sun, Ti Xiang et al.

Collaborative perception is essential for networks of agents with limited sensing capabilities, enabling them to work together by exchanging information to achieve a robust and comprehensive understanding of their environment. However, localization inaccuracies often lead to significant spatial message displacement, which undermines the effectiveness of these collaborative efforts. To tackle this challenge, we introduce FeaKM, a novel method that employs Feature-level Keypoints Matching to effectively correct pose discrepancies among collaborating agents. Our approach begins by utilizing a confidence map to identify and extract salient points from intermediate feature representations, allowing for the computation of their descriptors. This step ensures that the system can focus on the most relevant information, enhancing the matching process. We then implement a target-matching strategy that generates an assignment matrix, correlating the keypoints identified by different agents. This is critical for establishing accurate correspondences, which are essential for effective collaboration. Finally, we employ a fine-grained transformation matrix to synchronize the features of all agents and ascertain their relative statuses, ensuring coherent communication among them. Our experimental results demonstrate that FeaKM significantly outperforms existing methods on the DAIR-V2X dataset, confirming its robustness even under severe noise conditions. The code and implementation details are available at https://github.com/uestchjw/FeaKM.

CVMar 2
physfusion: A Transformer-based Dual-Stream Radar and Vision Fusion Framework for Open Water Surface Object Detection

Yuting Wan, Liguo Sun, Jiuwu Hao et al.

Detecting water-surface targets for Unmanned Surface Vehicles (USVs) is challenging due to wave clutter, specular reflections, and weak appearance cues in long-range observations. Although 4D millimeter-wave radar complements cameras under degraded illumination, maritime radar point clouds are sparse and intermittent, with reflectivity attributes exhibiting heavy-tailed variations under scattering and multipath, making conventional fusion designs struggle to exploit radar cues effectively. We propose PhysFusion, a physics-informed radar-image detection framework for water-surface perception. The framework integrates: (1) a Physics-Informed Radar Encoder (PIR Encoder) with an RCS Mapper and Quality Gate, transforming per-point radar attributes into compact scattering priors and predicting point-wise reliability for robust feature learning under clutter; (2) a Radar-guided Interactive Fusion Module (RIFM) performing query-level radar-image fusion between semantically enriched radar features and multi-scale visual features, with the radar branch modeled by a dual-stream backbone including a point-based local stream and a transformer-based global stream using Scattering-Aware Self-Attention (SASA); and (3) a Temporal Query Aggregation module (TQA) aggregating frame-wise fused queries over a short temporal window for temporally consistent representations. Experiments on WaterScenes and FLOW demonstrate that PhysFusion achieves 59.7% mAP50:95 and 90.3% mAP50 on WaterScenes (T=5 radar history) using 5.6M parameters and 12.5G FLOPs, and reaches 94.8% mAP50 and 46.2% mAP50:95 on FLOW under radar+camera setting. Ablation studies quantify the contributions of PIR Encoder, SASA-based global reasoning, and RIFM.