CVJan 15

A Unified 3D Object Perception Framework for Real-Time Outside-In Multi-Camera Systems

arXiv:2601.10819v1h-index: 7
Originality Incremental advance
AI Analysis

This addresses the problem of accurate 3D object perception and tracking for industrial infrastructure using outside-in camera networks, representing a domain-specific incremental improvement.

The paper tackles the challenge of adapting autonomous driving models to static multi-camera networks for industrial infrastructure, achieving a state-of-the-art HOTA score of 45.22 on the AI City Challenge 2025 benchmark and enabling real-time processing of over 64 camera streams on a single GPU with a 2.15× speedup.

Accurate 3D object perception and multi-target multi-camera (MTMC) tracking are fundamental for the digital transformation of industrial infrastructure. However, transitioning "inside-out" autonomous driving models to "outside-in" static camera networks presents significant challenges due to heterogeneous camera placements and extreme occlusion. In this paper, we present an adapted Sparse4D framework specifically optimized for large-scale infrastructure environments. Our system leverages absolute world-coordinate geometric priors and introduces an occlusion-aware ReID embedding module to maintain identity stability across distributed sensor networks. To bridge the Sim2Real domain gap without manual labeling, we employ a generative data augmentation strategy using the NVIDIA COSMOS framework, creating diverse environmental styles that enhance the model's appearance-invariance. Evaluated on the AI City Challenge 2025 benchmark, our camera-only framework achieves a state-of-the-art HOTA of $45.22$. Furthermore, we address real-time deployment constraints by developing an optimized TensorRT plugin for Multi-Scale Deformable Aggregation (MSDA). Our hardware-accelerated implementation achieves a $2.15\times$ speedup on modern GPU architectures, enabling a single Blackwell-class GPU to support over 64 concurrent camera streams.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes