3D-C2FT: Coarse-to-fine Transformer for Multi-view 3D Reconstruction
This work addresses the challenge of improving 3D reconstruction accuracy for computer vision applications, representing an incremental advancement in transformer-based methods.
The paper tackles the problem of multi-view 3D reconstruction by proposing 3D-C2FT, a model that uses a coarse-to-fine attention mechanism to encode multi-view features and correct defective 3D objects, achieving notable results and outperforming competing models on ShapeNet and Multi-view Real-life datasets.
Recently, the transformer model has been successfully employed for the multi-view 3D reconstruction problem. However, challenges remain on designing an attention mechanism to explore the multiview features and exploit their relations for reinforcing the encoding-decoding modules. This paper proposes a new model, namely 3D coarse-to-fine transformer (3D-C2FT), by introducing a novel coarse-to-fine(C2F) attention mechanism for encoding multi-view features and rectifying defective 3D objects. C2F attention mechanism enables the model to learn multi-view information flow and synthesize 3D surface correction in a coarse to fine-grained manner. The proposed model is evaluated by ShapeNet and Multi-view Real-life datasets. Experimental results show that 3D-C2FT achieves notable results and outperforms several competing models on these datasets.