CVAILGSep 21, 2025

Point-RTD: Replaced Token Denoising for Pretraining Transformer Models on Point Clouds

arXiv:2509.17207v1h-index: 3
Originality Highly original
AI Analysis

This work addresses the challenge of improving pretraining efficiency and performance for 3D point cloud tasks, offering a novel method that significantly outperforms existing baselines.

The paper tackles the problem of pretraining transformer models for 3D point clouds by introducing Point-RTD, a corruption-reconstruction strategy that replaces traditional masking with token denoising, resulting in over 93% lower reconstruction error and 14x lower Chamfer Distance compared to PointMAE on ShapeNet.

Pre-training strategies play a critical role in advancing the performance of transformer-based models for 3D point cloud tasks. In this paper, we introduce Point-RTD (Replaced Token Denoising), a novel pretraining strategy designed to improve token robustness through a corruption-reconstruction framework. Unlike traditional mask-based reconstruction tasks that hide data segments for later prediction, Point-RTD corrupts point cloud tokens and leverages a discriminator-generator architecture for denoising. This shift enables more effective learning of structural priors and significantly enhances model performance and efficiency. On the ShapeNet dataset, Point-RTD reduces reconstruction error by over 93% compared to PointMAE, and achieves more than 14x lower Chamfer Distance on the test set. Our method also converges faster and yields higher classification accuracy on ShapeNet, ModelNet10, and ModelNet40 benchmarks, clearly outperforming the baseline Point-MAE framework in every case.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes