MGP-KAD: Multimodal Geometric Priors and Kolmogorov-Arnold Decoder for Single-View 3D Reconstruction in Complex Scenes
This work addresses the challenge of 3D reconstruction from single images for applications in robotics and computer vision, representing an incremental advance by combining existing techniques in a novel way.
The paper tackled the problem of single-view 3D reconstruction in complex real-world scenes by proposing MGP-KAD, a multimodal feature fusion framework that integrates RGB and geometric priors with a hybrid decoder based on Kolmogorov-Arnold Networks, achieving state-of-the-art performance on the Pix3D dataset with significant improvements in geometric integrity, smoothness, and detail preservation.
Single-view 3D reconstruction in complex real-world scenes is challenging due to noise, object diversity, and limited dataset availability. To address these challenges, we propose MGP-KAD, a novel multimodal feature fusion framework that integrates RGB and geometric prior to enhance reconstruction accuracy. The geometric prior is generated by sampling and clustering ground-truth object data, producing class-level features that dynamically adjust during training to improve geometric understanding. Additionally, we introduce a hybrid decoder based on Kolmogorov-Arnold Networks (KAN) to overcome the limitations of traditional linear decoders in processing complex multimodal inputs. Extensive experiments on the Pix3D dataset demonstrate that MGP-KAD achieves state-of-the-art (SOTA) performance, significantly improving geometric integrity, smoothness, and detail preservation. Our work provides a robust and effective solution for advancing single-view 3D reconstruction in complex scenes.