Less is More: Efficient Point Cloud Reconstruction via Multi-Head Decoders
This work addresses the efficiency and generalization issues in point cloud reconstruction for computer vision applications, offering an incremental improvement over existing methods.
The paper tackles the problem of point cloud reconstruction by challenging the assumption that deeper decoders always improve performance, showing that excessive depth leads to overfitting. It proposes a multi-head decoder architecture that reconstructs shapes from independent subsets of points, achieving consistent improvements in metrics like Chamfer Distance and F1-score on ModelNet40 and ShapeNetPart datasets.
We challenge the common assumption that deeper decoder architectures always yield better performance in point cloud reconstruction. Our analysis reveals that, beyond a certain depth, increasing decoder complexity leads to overfitting and degraded generalization. Additionally, we propose a novel multi-head decoder architecture that exploits the inherent redundancy in point clouds by reconstructing complete shapes from multiple independent heads, each operating on a distinct subset of points. The final output is obtained by concatenating the predictions from all heads, enhancing both diversity and fidelity. Extensive experiments on ModelNet40 and ShapeNetPart demonstrate that our approach achieves consistent improvements across key metrics--including Chamfer Distance (CD), Hausdorff Distance (HD), Earth Mover's Distance (EMD), and F1-score--outperforming standard single-head baselines. Our findings highlight that output diversity and architectural design can be more critical than depth alone for effective and efficient point cloud reconstruction.