Ref-SAM3D: Bridging SAM3D with Text for Reference 3D Reconstruction
This addresses the problem of text-guided 3D reconstruction for applications like 3D editing and virtual environments, representing an incremental extension to SAM3D.
The paper tackles the limitation of SAM3D in reconstructing specific objects from textual descriptions by introducing Ref-SAM3D, which incorporates text as a prior for zero-shot 3D reconstruction from a single RGB image, achieving competitive and high-fidelity results.
SAM3D has garnered widespread attention for its strong 3D object reconstruction capabilities. However, a key limitation remains: SAM3D cannot reconstruct specific objects referred to by textual descriptions, a capability that is essential for practical applications such as 3D editing, game development, and virtual environments. To address this gap, we introduce Ref-SAM3D, a simple yet effective extension to SAM3D that incorporates textual descriptions as a high-level prior, enabling text-guided 3D reconstruction from a single RGB image. Through extensive qualitative experiments, we show that Ref-SAM3D, guided only by natural language and a single 2D view, delivers competitive and high-fidelity zero-shot reconstruction performance. Our results demonstrate that Ref-SAM3D effectively bridges the gap between 2D visual cues and 3D geometric understanding, offering a more flexible and accessible paradigm for reference-guided 3D reconstruction. Code is available at: https://github.com/FudanCVL/Ref-SAM3D.