Zefan Huang

RO
h-index11
4papers
2citations
Novelty49%
AI Score39

4 Papers

73.7ROApr 21
RMGS-SLAM: Real-time Multi-sensor Gaussian Splatting SLAM

Dongen Li, Yi Liu, Junqi Liu et al.

Achieving real-time Simultaneous Localization and Mapping (SLAM) based on 3D Gaussian splatting (3DGS) in large-scale real-world environments remains challenging, as existing methods still struggle to jointly achieve low-latency pose estimation, continuous 3D Gaussian reconstruction, and long-term global consistency. In this paper, we present a tightly coupled LiDAR-Inertial-Visual 3DGS-based SLAM framework for real-time pose estimation and photorealistic mapping in large-scale real-world scenes. The system executes state estimation and 3D Gaussian primitive initialization in parallel with global Gaussian optimization, enabling continuous dense mapping. To improve Gaussian initialization quality and accelerate optimization convergence, we introduce a cascaded strategy that combines feed-forward predictions with geometric priors derived from voxel-based principal component analysis. To enhance global consistency, we perform loop closure directly on the optimized global Gaussian map by estimating loop constraints through Gaussian-based Generalized Iterative Closest Point registration, followed by pose-graph optimization. We also collect challenging large-scale looped outdoor sequences with hardware-synchronized LiDAR-camera-IMU and ground-truth trajectories for realistic evaluation. Extensive experiments on both public datasets and our dataset demonstrate that the proposed method achieves a state of the art among real-time efficiency, localization accuracy, and rendering quality across diverse real-world scenes.

CVJun 9, 2025
DINO-CoDT: Multi-class Collaborative Detection and Tracking with Vision Foundation Models

Xunjie He, Christina Dao Wen Lee, Meiling Wang et al.

Collaborative perception plays a crucial role in enhancing environmental understanding by expanding the perceptual range and improving robustness against sensor failures, which primarily involves collaborative 3D detection and tracking tasks. The former focuses on object recognition in individual frames, while the latter captures continuous instance tracklets over time. However, existing works in both areas predominantly focus on the vehicle superclass, lacking effective solutions for both multi-class collaborative detection and tracking. This limitation hinders their applicability in real-world scenarios, which involve diverse object classes with varying appearances and motion patterns. To overcome these limitations, we propose a multi-class collaborative detection and tracking framework tailored for diverse road users. We first present a detector with a global spatial attention fusion (GSAF) module, enhancing multi-scale feature learning for objects of varying sizes. Next, we introduce a tracklet RE-IDentification (REID) module that leverages visual semantics with a vision foundation model to effectively reduce ID SWitch (IDSW) errors, in cases of erroneous mismatches involving small objects like pedestrians. We further design a velocity-based adaptive tracklet management (VATM) module that adjusts the tracking interval dynamically based on object motion. Extensive experiments on the V2X-Real and OPV2V datasets show that our approach significantly outperforms existing state-of-the-art methods in both detection and tracking accuracy.

AIMay 19, 2025
AGI-Elo: How Far Are We From Mastering A Task?

Shuo Sun, Yimin Zhao, Christina Dao Wen Lee et al.

As the field progresses toward Artificial General Intelligence (AGI), there is a pressing need for more comprehensive and insightful evaluation frameworks that go beyond aggregate performance metrics. This paper introduces a unified rating system that jointly models the difficulty of individual test cases and the competency of AI models (or humans) across vision, language, and action domains. Unlike existing metrics that focus solely on models, our approach allows for fine-grained, difficulty-aware evaluations through competitive interactions between models and tasks, capturing both the long-tail distribution of real-world challenges and the competency gap between current models and full task mastery. We validate the generalizability and robustness of our system through extensive experiments on multiple established datasets and models across distinct AGI domains. The resulting rating distributions offer novel perspectives and interpretable insights into task difficulty, model progression, and the outstanding challenges that remain on the path to achieving full AGI task mastery.

RODec 2, 2019
Online Multi-Target Tracking for Maneuvering Vehicles in Dynamic Road Context

Zehui Meng, Qi Heng Ho, Zefan Huang et al.

Target detection and tracking provides crucial information for motion planning and decision making in autonomous driving. This paper proposes an online multi-object tracking (MOT) framework with tracking-by-detection for maneuvering vehicles under motion uncertainty in dynamic road context. We employ a point cloud based vehicle detector to provide real-time 3D bounding boxes of detected vehicles and conduct the online bipartite optimization of the maneuver-orientated data association between the detections and the targets. Kalman Filter (KF) is adopted as the backbone for multi-object tracking. In order to entertain the maneuvering uncertainty, we leverage the interacting multiple model (IMM) approach to obtain the \textit{a-posterior} residual as the cost for each association hypothesis, which is calculated with the hybrid model posterior (after mode-switch). Road context is integrated to conduct adjustments of the time varying transition probability matrix (TPM) of the IMM to regulate the maneuvers according to road segments and traffic sign/signals, with which the data association is performed in a unified spatial-temporal fashion. Experiments show our framework is able to effectively track multiple vehicles with maneuvers subject to dynamic road context and localization drift.