StixelNExT++: Lightweight Monocular Scene Segmentation and Representation for Collective Perception
This work addresses scene representation for collective perception in autonomous systems, but it is incremental as it builds on the established Stixel representation.
The paper tackles monocular scene segmentation and representation for autonomous systems by proposing StixelNExT++, which infers 3D Stixels and clusters them for object segmentation, achieving real-time performance with computation times as low as 10 ms per frame and competitive results on the Waymo dataset within a 30-meter range.
This paper presents StixelNExT++, a novel approach to scene representation for monocular perception systems. Building on the established Stixel representation, our method infers 3D Stixels and enhances object segmentation by clustering smaller 3D Stixel units. The approach achieves high compression of scene information while remaining adaptable to point cloud and bird's-eye-view representations. Our lightweight neural network, trained on automatically generated LiDAR-based ground truth, achieves real-time performance with computation times as low as 10 ms per frame. Experimental results on the Waymo dataset demonstrate competitive performance within a 30-meter range, highlighting the potential of StixelNExT++ for collective perception in autonomous systems.