R3D-SWIN:Use Shifted Window Attention for Single-View 3D Reconstruction
This work improves 3D reconstruction accuracy for computer vision applications, though it appears incremental as it adapts an existing attention mechanism to a new task.
The paper tackles the problem of single-view 3D reconstruction by proposing a voxel-based network using shifted window attention, which addresses limitations in vision transformers like lack of multi-scale windows and inter-window connections. Experimental results on ShapeNet show it achieves state-of-the-art accuracy.
Recently, vision transformers have performed well in various computer vision tasks, including voxel 3D reconstruction. However, the windows of the vision transformer are not multi-scale, and there is no connection between the windows, which limits the accuracy of voxel 3D reconstruction. Therefore, we propose a voxel 3D reconstruction network based on shifted window attention. To the best of our knowledge, this is the first work to apply shifted window attention to voxel 3D reconstruction. Experimental results on ShapeNet verify our method achieves SOTA accuracy in single-view reconstruction.