Taming Transformers for Realistic Lidar Point Cloud Generation
This work improves Lidar point cloud generation for autonomous driving applications, but it is incremental as it builds on existing transformer and VQ-VAE methods.
The paper tackles the problem of generating realistic Lidar point clouds by addressing the failure of diffusion models to model raydrop noise, introducing LidarGRIT which uses auto-regressive transformers and VQ-VAE to achieve superior performance on KITTI-360 and KITTI odometry datasets.
Diffusion Models (DMs) have achieved State-Of-The-Art (SOTA) results in the Lidar point cloud generation task, benefiting from their stable training and iterative refinement during sampling. However, DMs often fail to realistically model Lidar raydrop noise due to their inherent denoising process. To retain the strength of iterative sampling while enhancing the generation of raydrop noise, we introduce LidarGRIT, a generative model that uses auto-regressive transformers to iteratively sample the range images in the latent space rather than image space. Furthermore, LidarGRIT utilises VQ-VAE to separately decode range images and raydrop masks. Our results show that LidarGRIT achieves superior performance compared to SOTA models on KITTI-360 and KITTI odometry datasets. Code available at:https://github.com/hamedhaghighi/LidarGRIT.