CVJul 3, 2025

FMOcc: TPV-Driven Flow Matching for 3D Occupancy Prediction with Selective State Space Model

arXiv:2507.02250v1h-index: 1
Originality Incremental advance
AI Analysis

This work addresses accuracy limitations in 3D occupancy prediction for autonomous driving, offering a more efficient alternative to methods requiring historical data, though it appears incremental as it builds on existing TPV and flow matching techniques.

The paper tackles the problem of 3D semantic occupancy prediction in autonomous driving by proposing FMOcc, a method that uses flow matching and selective state space models to improve accuracy for occluded and distant scenes with few-frame input, achieving 43.1% RayIoU and 39.8% mIoU on Occ3D-nuScenes.

3D semantic occupancy prediction plays a pivotal role in autonomous driving. However, inherent limitations of fewframe images and redundancy in 3D space compromise prediction accuracy for occluded and distant scenes. Existing methods enhance performance by fusing historical frame data, which need additional data and significant computational resources. To address these issues, this paper propose FMOcc, a Tri-perspective View (TPV) refinement occupancy network with flow matching selective state space model for few-frame 3D occupancy prediction. Firstly, to generate missing features, we designed a feature refinement module based on a flow matching model, which is called Flow Matching SSM module (FMSSM). Furthermore, by designing the TPV SSM layer and Plane Selective SSM (PS3M), we selectively filter TPV features to reduce the impact of air voxels on non-air voxels, thereby enhancing the overall efficiency of the model and prediction capability for distant scenes. Finally, we design the Mask Training (MT) method to enhance the robustness of FMOcc and address the issue of sensor data loss. Experimental results on the Occ3D-nuScenes and OpenOcc datasets show that our FMOcc outperforms existing state-of-theart methods. Our FMOcc with two frame input achieves notable scores of 43.1% RayIoU and 39.8% mIoU on Occ3D-nuScenes validation, 42.6% RayIoU on OpenOcc with 5.4 G inference memory and 330ms inference time.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes