CVAug 2, 2024
EVIT: Event-based Visual-Inertial Tracking in Semi-Dense Maps Using Windowed Nonlinear OptimizationRunze Yuan, Tao Liu, Zijia Dai et al.
Event cameras are an interesting visual exteroceptive sensor that reacts to brightness changes rather than integrating absolute image intensities. Owing to this design, the sensor exhibits strong performance in situations of challenging dynamics and illumination conditions. While event-based simultaneous tracking and mapping remains a challenging problem, a number of recent works have pointed out the sensor's suitability for prior map-based tracking. By making use of cross-modal registration paradigms, the camera's ego-motion can be tracked across a large spectrum of illumination and dynamics conditions on top of accurate maps that have been created a priori by more traditional sensors. The present paper follows up on a recently introduced event-based geometric semi-dense tracking paradigm, and proposes the addition of inertial signals in order to robustify the estimation. More specifically, the added signals provide strong cues for pose initialization as well as regularization during windowed, multi-frame tracking. As a result, the proposed framework achieves increased performance under challenging illumination conditions as well as a reduction of the rate at which intermediate event representations need to be registered in order to maintain stable tracking across highly dynamic sequences. Our evaluation focuses on a diverse set of real world sequences and comprises a comparison of our proposed method against a purely event-based alternative running at different rates.
CVSep 28, 2024
GS-EVT: Cross-Modal Event Camera Tracking based on Gaussian SplattingTao Liu, Runze Yuan, Yi'ang Ju et al.
Reliable self-localization is a foundational skill for many intelligent mobile platforms. This paper explores the use of event cameras for motion tracking thereby providing a solution with inherent robustness under difficult dynamics and illumination. In order to circumvent the challenge of event camera-based mapping, the solution is framed in a cross-modal way. It tracks a map representation that comes directly from frame-based cameras. Specifically, the proposed method operates on top of gaussian splatting, a state-of-the-art representation that permits highly efficient and realistic novel view synthesis. The key of our approach consists of a novel pose parametrization that uses a reference pose plus first order dynamics for local differential image rendering. The latter is then compared against images of integrated events in a staggered coarse-to-fine optimization scheme. As demonstrated by our results, the realistic view rendering ability of gaussian splatting leads to stable and accurate tracking across a variety of both publicly available and newly recorded data sequences.
LGMar 5, 2024
World Models for Autonomous Driving: An Initial SurveyYanchen Guan, Haicheng Liao, Zhenning Li et al.
In the rapidly evolving landscape of autonomous driving, the capability to accurately predict future events and assess their implications is paramount for both safety and efficiency, critically aiding the decision-making process. World models have emerged as a transformative approach, enabling autonomous driving systems to synthesize and interpret vast amounts of sensor data, thereby predicting potential future scenarios and compensating for information gaps. This paper provides an initial review of the current state and prospective advancements of world models in autonomous driving, spanning their theoretical underpinnings, practical applications, and the ongoing research efforts aimed at overcoming existing limitations. Highlighting the significant role of world models in advancing autonomous driving technologies, this survey aspires to serve as a foundational reference for the research community, facilitating swift access to and comprehension of this burgeoning field, and inspiring continued innovation and exploration.
CVMar 27, 2025
ICG-MVSNet: Learning Intra-view and Cross-view Relationships for Guidance in Multi-View StereoYuxi Hu, Jun Zhang, Zhe Zhang et al.
Multi-view Stereo (MVS) aims to estimate depth and reconstruct 3D point clouds from a series of overlapping images. Recent learning-based MVS frameworks overlook the geometric information embedded in features and correlations, leading to weak cost matching. In this paper, we propose ICG-MVSNet, which explicitly integrates intra-view and cross-view relationships for depth estimation. Specifically, we develop an intra-view feature fusion module that leverages the feature coordinate correlations within a single image to enhance robust cost matching. Additionally, we introduce a lightweight cross-view aggregation module that efficiently utilizes the contextual information from volume correlations to guide regularization. Our method is evaluated on the DTU dataset and Tanks and Temples benchmark, consistently achieving competitive performance against state-of-the-art works, while requiring lower computational resources.
LGNov 27, 2025
Controllable risk scenario generation from human crash data for autonomous vehicle testingQiujing Lu, Xuanhan Wang, Runze Yuan et al.
Ensuring the safety of autonomous vehicles (AV) requires rigorous testing under both everyday driving and rare, safety-critical conditions. A key challenge lies in simulating environment agents, including background vehicles (BVs) and vulnerable road users (VRUs), that behave realistically in nominal traffic while also exhibiting risk-prone behaviors consistent with real-world accidents. We introduce Controllable Risk Agent Generation (CRAG), a framework designed to unify the modeling of dominant nominal behaviors and rare safety-critical behaviors. CRAG constructs a structured latent space that disentangles normal and risk-related behaviors, enabling efficient use of limited crash data. By combining risk-aware latent representations with optimization-based mode-transition mechanisms, the framework allows agents to shift smoothly and plausibly from safe to risk states over extended horizons, while maintaining high fidelity in both regimes. Extensive experiments show that CRAG improves diversity compared to existing baselines, while also enabling controllable generation of risk scenarios for targeted and efficient evaluation of AV robustness.