SPARK: Scalable Real-Time Point Cloud Aggregation with Multi-View Self-Calibration
This addresses challenges in multi-camera 3D perception for applications like robotics and immersive interaction, representing a novel method for known bottlenecks rather than a foundational breakthrough.
The paper tackles the problem of real-time multi-camera 3D reconstruction by proposing SPARK, a framework that jointly handles point cloud fusion and extrinsic uncertainty, resulting in improved extrinsic accuracy, geometric consistency, temporal stability, and real-time performance as shown in experiments on real-world systems.
Real-time multi-camera 3D reconstruction is crucial for 3D perception, immersive interaction, and robotics. Existing methods struggle with multi-view fusion, camera extrinsic uncertainty, and scalability for large camera setups. We propose SPARK, a self-calibrating real-time multi-camera point cloud reconstruction framework that jointly handles point cloud fusion and extrinsic uncertainty. SPARK consists of: (1) a geometry-aware online extrinsic estimation module leveraging multi-view priors and enforcing cross-view and temporal consistency for stable self-calibration, and (2) a confidence-driven point cloud fusion strategy modeling depth reliability and visibility at pixel and point levels to suppress noise and view-dependent inconsistencies. By performing frame-wise fusion without accumulation, SPARK produces stable point clouds in dynamic scenes while scaling linearly with the number of cameras. Extensive experiments on real-world multi-camera systems show that SPARK outperforms existing approaches in extrinsic accuracy, geometric consistency, temporal stability, and real-time performance, demonstrating its effectiveness and scalability for large-scale multi-camera 3D reconstruction.