CVROJul 23, 2025

Talk2Event: Grounded Understanding of Dynamic Scenes from Event Cameras

arXiv:2507.17664v25 citationsh-index: 8
Originality Highly original
AI Analysis

This work addresses the problem of language-driven object grounding in event-based perception for real-world robotics and autonomy, representing a foundational step rather than an incremental improvement.

The paper tackles the challenge of connecting asynchronous event camera streams to human language for dynamic scene understanding, introducing Talk2Event, a large-scale benchmark with over 30,000 validated referring expressions, and EventRefer, an attribute-aware grounding framework that achieves consistent gains over state-of-the-art baselines in various settings.

Event cameras offer microsecond-level latency and robustness to motion blur, making them ideal for understanding dynamic environments. Yet, connecting these asynchronous streams to human language remains an open challenge. We introduce Talk2Event, the first large-scale benchmark for language-driven object grounding in event-based perception. Built from real-world driving data, we provide over 30,000 validated referring expressions, each enriched with four grounding attributes -- appearance, status, relation to viewer, and relation to other objects -- bridging spatial, temporal, and relational reasoning. To fully exploit these cues, we propose EventRefer, an attribute-aware grounding framework that dynamically fuses multi-attribute representations through a Mixture of Event-Attribute Experts (MoEE). Our method adapts to different modalities and scene dynamics, achieving consistent gains over state-of-the-art baselines in event-only, frame-only, and event-frame fusion settings. We hope our dataset and approach will establish a foundation for advancing multimodal, temporally-aware, and language-driven perception in real-world robotics and autonomy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes