TrackOcc: Camera-based 4D Panoptic Occupancy Tracking
This addresses the need for advanced autonomous systems to have spatial comprehensiveness and temporal consistency in perception, though it appears incremental as it builds on existing tasks like 3D object tracking and semantic occupancy prediction.
The paper tackles the problem of achieving comprehensive and consistent dynamic scene understanding from camera input by introducing a new task, Camera-based 4D Panoptic Occupancy Tracking, and proposes TrackOcc, an end-to-end method that achieves state-of-the-art performance on the Waymo dataset.
Comprehensive and consistent dynamic scene understanding from camera input is essential for advanced autonomous systems. Traditional camera-based perception tasks like 3D object tracking and semantic occupancy prediction lack either spatial comprehensiveness or temporal consistency. In this work, we introduce a brand-new task, Camera-based 4D Panoptic Occupancy Tracking, which simultaneously addresses panoptic occupancy segmentation and object tracking from camera-only input. Furthermore, we propose TrackOcc, a cutting-edge approach that processes image inputs in a streaming, end-to-end manner with 4D panoptic queries to address the proposed task. Leveraging the localization-aware loss, TrackOcc enhances the accuracy of 4D panoptic occupancy tracking without bells and whistles. Experimental results demonstrate that our method achieves state-of-the-art performance on the Waymo dataset. The source code will be released at https://github.com/Tsinghua-MARS-Lab/TrackOcc.