CVSep 18, 2024

Tracking Any Point with Frame-Event Fusion Network at High Frame Rate

arXiv:2409.11953v18 citationsh-index: 4Has Code
Originality Incremental advance
AI Analysis

This addresses instability in point tracking for applications like autonomous driving, though it is an incremental improvement over existing methods.

The paper tackles the problem of tracking points in high-speed scenarios by proposing FE-TAP, an image-event fusion tracker that combines frame context with event temporal resolution, achieving a 24% improvement in expected feature age on EDS datasets.

Tracking any point based on image frames is constrained by frame rates, leading to instability in high-speed scenarios and limited generalization in real-world applications. To overcome these limitations, we propose an image-event fusion point tracker, FE-TAP, which combines the contextual information from image frames with the high temporal resolution of events, achieving high frame rate and robust point tracking under various challenging conditions. Specifically, we designed an Evolution Fusion module (EvoFusion) to model the image generation process guided by events. This module can effectively integrate valuable information from both modalities operating at different frequencies. To achieve smoother point trajectories, we employed a transformer-based refinement strategy that updates the point's trajectories and features iteratively. Extensive experiments demonstrate that our method outperforms state-of-the-art approaches, particularly improving expected feature age by 24$\%$ on EDS datasets. Finally, we qualitatively validated the robustness of our algorithm in real driving scenarios using our custom-designed high-resolution image-event synchronization device. Our source code will be released at https://github.com/ljx1002/FE-TAP.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes