STRinGS: Selective Text Refinement in Gaussian Splatting
This work addresses the challenge of reconstructing readable text in 3D scenes for applications in scene understanding and text-aware reconstruction, representing an incremental improvement over existing 3DGS methods.
The paper tackles the problem of preserving fine-grained text details in 3D Gaussian Splatting (3DGS) reconstructions, which often lose semantic information due to small errors, by proposing STRinGS, a selective refinement framework that improves text readability by 63.6% relative to 3DGS at 7K iterations.
Text as signs, labels, or instructions is a critical element of real-world scenes as they can convey important contextual information. 3D representations such as 3D Gaussian Splatting (3DGS) struggle to preserve fine-grained text details, while achieving high visual fidelity. Small errors in textual element reconstruction can lead to significant semantic loss. We propose STRinGS, a text-aware, selective refinement framework to address this issue for 3DGS reconstruction. Our method treats text and non-text regions separately, refining text regions first and merging them with non-text regions later for full-scene optimization. STRinGS produces sharp, readable text even in challenging configurations. We introduce a text readability measure OCR Character Error Rate (CER) to evaluate the efficacy on text regions. STRinGS results in a 63.6% relative improvement over 3DGS at just 7K iterations. We also introduce a curated dataset STRinGS-360 with diverse text scenarios to evaluate text readability in 3D reconstruction. Our method and dataset together push the boundaries of 3D scene understanding in text-rich environments, paving the way for more robust text-aware reconstruction methods.